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Preface 


The history of nonparametric methods based on ranks is rather brief, 
extending back roughly forty years. I would like to mention a few of the 
major contributions. The list is subjective and by no means exhaustive. The 
field continues to be an active area of research and much is owed to a few 
key contributions. 

The systematic development and assessment of the nonparametric meth- 
ods studied in this book began with the work of F. Wilcoxon in 1945 and 
H. B. Mann and D. R. Whitney in 1947. For the next decade, nonparamet- 
ric tests for location were studied using Pitman’s asymptotic efficiency to 
assess their local power properties. In a series of papers, J. L. Hodges and 
E. L. Lehmann discovered the surprising result that rank tests suffer 
negligible efficiency loss when compared to the /-test at the normal model 
and may be much more efficient at heavy-tailed models. At about this time, 
nonparametric tests began to gain some acceptance from data analysts and 
found their way into the final chapter of elementary textbooks. The first 
book on applied nonparametric statistics was pubhshed by S. Siegel in 
1956. It was a great success, especially among behavioral scientists, and, in 
fact, it ranked second on the list of most-cited mathematics and statistics 
books, 1961-1972, with 1824 citations. Since 1970, new books on non- 
parametrics have appeared at the rate of roughly one per year. 

In the 1960s, Hodges and Lehmann derived point estimates and confi- 
dence intervals for location parameters from rank test statistics. They 
further showed that the estimation methods inherit their efficiency proper- 
ties from the parent test statistics. It was also found that these estimates are 
robust according to the new criteria proposed by J. W. Tukey, P. J. Huber, 
and F . Hampel for assessing stability of estimates. During this same period, 
J. Hajek developed a new and powerful approach to the asymptotic 
distribution theory needed for the construction of general rank score test 
statistics. 

Aligned rank tests for analysis of designed experiments were introduced 
in the early 1960s by Hodges and Lehmann and extensively developed by 
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M L. Pun and P K Sen J N Adichie proposed and studied rank tests 
and the corresponding estimates for simple regression models 

In the 1970s the previous work was consolidated and extended to 
tank -based tests and estimates m the linear model Much of the asymptotic 
distribution theory needed for the linear model denvcs from basic results 
due to J Jureckova published just prior to 1970 Based on her work, it is 
possible to develop unified approaches based on ranks to the analysis of 
complex data sets Hopefully, m the 1980s we will see the computer 
implementation and more widespread use of these efficient and robust 
statistical methods 

The major goal of this book is to develop a coherent and unified set of 
statistical methods (based on ranks) for carrying out inferences in various 
experimental situations The book begins with the simple one-sample loca- 
tion mode! and progresses through the two-sample location model, the one- 
and two-way layouts to the general linear model A final chapter develops 
methods for the muliivanaie location models In all cases, testing and 
estimation are developed together as an interconnected set of methods for 
each model 

The basic tools and results from maihemalica) suiisiics are introduced 
as they are needed The tools fall mio two groups tools to assess the 
statistical properties of the procedures and tools to assess the stability 
properties In the former case, the major tools are asymptotic relative 
efficiency and asymptotic local power In the latter case, the mam tools are 
the influence curve and the tolerance (breakdown) The stability criteria are 
central to the modem theory of robust statistical methods The statistical 
efficiency properties are described for all the methods introduced in the 
book The robustness properties arc developed extensively in the one- 
sample location model and discussed briefly for the simple regression 
model The goal is to help the student develop a working knowledge of 
both efficiency and robustness 

The text is organized around statistical models because this is the 
context in which statistical inference and data analysis are earned out By 
acqumng a firm undentandmg of the methods and their properties in the 
simple models, the reader wiJj be prepared to deal with the methods in the 
general linear model We provide a rigorous development of methods based 
on rank sums These methods mclude the Wilcoxon signed rank statistic, 
the Mano-Whitney-Wilcoxon statistic, the Kruskal-Wallis statistic, the 
Friedman statistic, and rank tests based on residuals m the linear model 
The more general sums of rank scores are discussed and integrated into the 
discussion with references to the sources of their ngorous development We 
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have concentrated on rank sums for two reasons: they are the most 
commonly used by researchers, and their properties can be explored with 
the least amount of mathematical sophistication. 

The linear model, which includes multiple regression and analysis of 
variance designs, is not generally treated systematically in texts on nonpara- 
metric statistics, applied or theoretical. This is a serious omission since most 
data analysis is carried out in the context of the linear model. Part of the 
reason that serious researchers have not used nonparametric methods more 
extensively is the lack of their systematic development for the linear model. 
The present text provides a development of these methods. Furthermore, in 
the near future, there will be statistical software available to implement 
these methods. The Minitab statistical computing system, which already 
includes the major nonparametric methods for the simple designs, will 
include a rank regression command that will provide both rank tests and 
estimates. Hence, the procedures developed in the text will be fully opera- 
tional and can be used by researchers for analysis in complex data sets. 

The book contains many exercises and problems. Major results in 
exercises are explicitly presented. Thus, the equations are available to the 
reader who does not want to take the time to derive them. An appendix of 
important results (without proofs) from the main body of mathematical 
statistics is provided. All major procedures are illustrated on data sets. 

The first three chapters cover the one- and two-sample location models. 
Finite sample and asymptotic distribution theory is developed. Tests, point 
estimates, and confidence intervals are derived. Their properties are ex- 
plored through asymptotic efficiency, influence curves, and tolerance 
(breakdown). This material can be covered in a one-semester course at the 
first- or second-year graduate level. The prerequisites are an introductory 
course in mathematical statistics and a course in advanced calculus. 

Most of the robustness material is located at the end of sections. If more 
statistical inference is desired, then by skipping the robustness material, 
topics in the one- and two-way layout designs (Chapter 4) can be covered 
or the multivariate versions of the one- and two-sample univariate tests 
(Chapter 6) can be covered. The rank methods in the linear model (Chapter 
5) can be developed in a follow-up seminar. The material in Chapter 5 
requires a deeper background in statistics. The reader should have prior 
knowledge of the linear model in matrix notation. 

The book has grown out of lectures given at Penn State over the last 15 
years. My interest in the subject derives from discussions and work at the 
University of Iowa with Bob Hogg and Tim Robertson. I thank them for 
many hours of stimulating conversations. Graduate students whom I have 
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worked with over the years provided the continual challenges needed to 
sort out many of the ideas ! would like to especially acknowledge my 
appreciation to Joe McKean and Jay Aubuchon 

1 would also like to thank Bill Haikoess, head of the Statistics Depart* 
ment at Penn State, for continual support The Office of Naval Research, 
which sponsored part of the research that appears in Chapter 5, is grate 
fully acknowledged Bea Shubc at John Wiley, has been most helpful 
throughout the work on the book Finally, thanks to the typists Jane Uhnn, 
Peggy Lynch. Bonnie Cam Barbara Itinger, and especially Bonnie Hen 
nmger, who struggled mightily to transcribe my notes 

Thomas P Hettmansperger 
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CHAPTER 1 


The One-Sample Location 
Model with an Arbitrary, 
Continuous Distribution 


1.1. INTRODUCTION 

We begin by considering observations drawn from a single population 
about which we wish to make a minimal number of assumptions. We will 
only suppose that the underlying distribution is continuous and that we 
wish to make statistical inferences about the location of the population. 

Hence we need to define a measure of location for an arbitrary, 
continuous distribution. Since we make no shape assumptions on the 
distribution, the two most common measures, the mean and the median, 
will not necessarily coincide. In the case of symmetry, they do coincide and 
naturally locate the center of the underlying population. 

The median has two advantages over the mean in the general setting. 
The first is that the median always exists, being approximately that point 
which divides the population distribution in half. The mean, on the other 
hand, need not exist, as in the case of a Cauchy distribution. Second, the 
median is very resistant to slight perturbations in the underlying distribu- 
tion. Hence, if there are outliers or gross errors present in the population, 
they will have little influence on the median but may produce extreme 
changes in the population mean. Later in the chapter, we consider in 
greater detail the stability properties of the mean and median. For now, we 
will simply use the population median 9 as the measure of location. 

In this chapter we introduce a test, confidence interval, and point 
estimate of 6 and investigate some of the properties of these procedures. 
Many of the ideas presented here are used throughout the text and the 
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Simple location model provides a nice context for their discussion This 
chapter contains discussions of asymptotic approximations for the signifi 
cance level and power of a test, the consistency of a lest, and the denvation 
of estimation procedures from a test statistic 

Suppose AT IS a random vanable with an arbitrary, continuous distnbu* 
tion function /"(x) = P(X < x) we will refer interchangably to the median 
of X or (he median of F The median a defined as a point ff such that 

( 111 ) 

In general, there may be many such values of 9 that qualify as the 
median (Exercise I 8 1} When this is the case, the ambiguity should be 
removed by specifying a rule for the selection of a specific median 

In this book, we assume that the median is unique and that F is 
absolutely continuous with a density function fix) = F'(x) We denote this 
class of distributions by 

0o“{^ /"absolutely continuous and /"(O) * i , uniquely) (112) 

The sampling model consists of a random sample X|, .X. of inde* 

pendent, identically distributed (i i d ) random variables, each distributed as 
/"(x - 9), F € Oq The first statistical inference problem that we consider is 
a test of 


Hq 9-0 versus 


0>O 


(113) 


This IS the most general one-sided hypothesis that we need to consider 
The test Hg 9 — 0^, 9g specified against 9 >9i} can be reduced to 
(1 I 3) by noting that T, «» X, - 0q. , — X„- 9^ is a sample from 

F e fig under Hq Furthermore, we will usually discuss one-sided hypothe- 
sis testing because it is then generally clear how to develop the correspond- 
ing two sided procedures Finally note that both the null and alternative 
hypotheses are composite, llg only specifies that the sample comes from 
some arbitrary, absolutely continuous distribution with unique median 0, 
and specifies an arbitrary distnbution with median greater than 0 

Example 1.1.1. In the early 1950s G V T Matthews and others earned 
out interesting expenments on bird navigation See Matthews (1952, 1974) 
for an account of this research In some of the early work with homing 
pigeons, the birds were trained to “home” from sizeable distances out along 
a specific training line To determine whether the birds could find their way 
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home systematically from unfamiliar points, experiments were conducted in 
which the birds were taken out on lines at 90° and 180° from the training 
line and released. One measurement of interest was the angle betWeen the 
bird’s line of flight as he disappeared over the horizon and the homing line. 
These error angles were measured between 0° and 180° so lines above or 
below the homing line were not distinguished. Let 0 denote the population 
median error angle. If the birds are not homing, 6 = 90°. One research 
hypothesis specified that the birds were using the sun to navigate, and in 
fact 0 < 90°. Hence, we want to construct a test of HqI 0 = 90° versus 
H^:0 < 90°. If we consider only the data on birds released on sunny days 
we cannot conclude that the birds are using the sun to navigate if we reject 
Hq:0 = 9O°. We may only conclude the birds are homing. The data is 
given in Example 1.5.1. In Chapter 3 we introduce two-sample tests which 
can be used to compare birds released on sunny and cloudy days. In that 
case we randomly assign birds to sunny or cloudy days and compare the 
results. 


1.2. THE SIGN TEST AND ITS DISTRIBUTION 

In this section we introduce the simplest test possible for (1.1.3): the sign 
test. This test is one of the oldest statistical procedures, dating back to 
Arbuthnott’s research in 1710 on whether the proportion of male births 
exceeded ^ in London. Although the sign test is very simple, several of the 
later rank tests which we will study can be put into a sign test or counting 
form. For this reason we develop the properties of the sign test in detail. 
This will make the analysis of some aspects of the later rank tests almost 
routine. 

Let 


S=#(A,>0), /=!,...,« 

= ( 1 . 2 . 1 ) 

1 = 1 

where s{x) = 1 if x > 0 and 0 otherwise. The rule is to reject Ho:0 = 0 in 
favor of : 0 > 0 if S > k. The critical value k is determined so that 
> k) = a, the significance level of the test. Hence we must first 
determine the distribution of S under //q. 

Under Hq:0 = O, j(A’|), . . . , ^(A,,) are i.i.d., each binomial with param- 
eters 1 and/, = P{X > 0) = 1 - f(0) = 1/2, written B(l, 1/2). Hence, S is 
the sum of n, i.i.d. 5(1, 1/2) random variables and has a 5(«, 1/2) 
distribution. The critical value k is found in a binomial table. 
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Note that the distribution of S does not depend on which F GHq we are 
sampling from and hence k, the cntical value can be found without 
knowing F It is in this sense that we say S is distnbution free or 
nonparametne under B = 0 On the other hand under 6 ^ 9 >0 
S still has a Bin p) distribution but now 

p=P(X>0)=i-F(-9) (U2) 


which depends on F Hence S is not distnbution free under 

Since under either hypothesis S' = 2)*(A’,) is the sum of iid B(1 p) 
random variables with Var(j(A’,)J = p (I — p) < oo we can apply the Central 
Limit Theorem (Theorem A8 in the Appendix) to assert that 


S~ ES 


(I2J) 


has an approximate standard normal distribution that is a normal distnbu 
tion with mean 0 and vanance I provided 0<p < I This convergence m 
distribution may also be denoted by 

Sz-MO 1) 

/VarS 


From (1 2 2) and since 5 has a Bin p) distribution it follows that 


£5-n[l-F(-«)] 
VarS°n[l - 


(124) 


with ES = n/2 and VarS ® n/4 under Hq P ** 0 These results can then 
be used to approximate the cntical value k when a binomial table is not 
available 

a= P{S> k) 


ES ^ k-i,/2 \ 

1 ^ I 


-i 


f/A 


(12 5) 


where ^>( ) denotes the standard normal distribution function Let 2^ 
denote the upper a percentile of the standard normal distnbution that is 
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a = 1 - $(Z„), then 


( 1 . 2 . 6 ) 

VV4 


and /: = n/2 + Zjn /2. 

As can be seen from Fig. 1.1, the accuracy of the approximation can be 
improved by correcting for the discreteness of the binomial distribution. 
Hence (1.2.6) becomes, after subtracting 0.5 from k, 

k^n/2 + 0.5+ fl. (1.2.7) 

The corrected approximation to a in (1.2.5) is quite accurate for sample 
sizes as small as 3 or 4. This is because of the symmetry of the binomial 
distribution under Hq: 6 = 0. Later, rank tests will be seen to have symmet- 
ric distributions under Hq:6 = 0 and again the corrected normal approxi- 
mations for the significance level will be surprisingly accurate for small 
samples. Table 1.1 illustrates the approximation for the sign test. Sign test 
computations are illustrated in Example 1.5.1. 


Table 1.1. Normal Approximation to the Binomial, 
p = 1/2 and n = 5, with Continuity Correction 




k 


Function 

0 

1 

3 

PiS < k) 

.03125 

.1875 

.5 

4>(.) 

.0367 

.1867 

.5 
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U. CONSISTENCY OF A STATISTICAL TEST 

In this section we discuss the idea of consistency of a test This corresponds 
to the idea of consistency or convergence in probability of an estimator and 
depends for its demonstration on Chebyshcv*s inequality (see Theorem A4 
in the Appendix) Consistency of a test is an asymptotic property and thus 
reflects the behavior of the lest for large sample sizes 

A test IS consistent for some alternative when the power to detect that 
alternative tends to one as the sample size increases and the significance 
level IS bounded away from zero Any reasonable test will be consistent for 
some set of alternatives We present a theorem that helps define that set 
In this book the null hypothesis is generally composite and we let n„„,| 
denote the set of distributions specified by the null hypothesis Likewise we 
let denote the composite alternative Then we are interested in testing 

//o versus CeR,|, (131) 

where G denotes the sampled distribution We now suppose that the test 
statistic based on a sample size of n, satisfies 

l'.4,(C) «s (132) 

definition Al m the Appendix, and that the function /i( ) satisfies 

p(G)-fio 

>/io VGefl,CR,|, (13 3) 

Hence p(G), the stochastic limit of the lest statistic, separates the null 
hypothesis from a subclass of the alternative hypothesis It is this ability of 
the test to distinguish the null from alternative hypotheses that determines 
Its consistency The class 0, defined by fi(C) in (13 3) is called the 
consistency class of 

The nonparametne test statistics developed in this book are generally 
asymptotically normally distributed Recall, for example, the sign test and 
(1 2 3) Hence is it possible to construct the cntical value fc, m such a way 
that 

a, = Pc as n^«3, VGef2„„|, (134) 

We will say a test that satisfies (1 3 4) is asymptotically size a 

The following theorem combines the stochastic convergence and separa- 
tion in (13 2) and (13 3) with asymptotic normality to establish the 
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asymptotic size and consistency of a test. The assumption of asymptotic 
normality is stronger than needed since consistency is basically a property 
of convergence in probability. The reader is referred to a paper by Leh- 
mann (1951) which establishes consistency under weaker assumptions es- 
sentially requiring Chebyshev’s inequality. 

Theorem 13.1. Suppose V„ is a test statistic for (1.3.1) that rejects Hq for 
large values and satisfies (1.3.2) and (1.3.3). Suppose further that there is a 
constant Oq such that 

^ ^Z~n(0, 1) \/G (1.3.5) 

^0 

Then there exists a sequence of critical values {k„) such that V„ is 
asymptotically size a and 


VGeS2,. 

Proof. Let be the upper a percentile for the standard normal distribu- 
tion and define 


Now 


V« 


(1.3.6) 


«n = Pg{K > K) 



( K - Mo) 
oo 


> z„ 


0 !„-^a by (1.3.5) and F„ is asymptotically size a. Next fix G* £ and 
define 


M(G'*)-Po 

^ = 2 • 0 - 3 - 7 ) 

Now, by (1.3.3), e > 0 and for sufficiently large n, k„< because 

from (1.3.6). Further, from (1.3.7), juq = jli(G*) - 2e; hence 

Nw I F„ - g(G*)l < c implies V„ - p(G*) > - e, which implies ' V 
/i( ) £, which implies V„> k„, the last implication following from 



(I 3 8) Hence we have 


T!IP ONE-SAMPLE LOCATION MODEL 


Pc.(i y. - /•(<?*)! < «) < ^C-( K. >fc„)<l 

By (1 3 2), the left side lends lo 1, hence 1 Since G* was 

arbitrarily fixed m fl, the theorem is proved 

Example 1 J 1. The consistency of the sign test 

Let S = S'/«, where S is given m (I 2 I) From (1 2 4) and Chebyshev’s 
inequality {Theorem A4) 

s4p{F.«)-l-F(-fl) 

where Fen,] Hence we have 

p(F,«) = J VFefio. 0-0 

>1 VFefto- o>o 

and the sign test separates the null hypothesis F 9-0, from the 
whole alternative F £ 9 > 0 The consistency set for the sign test is the 

class of absolutely continuous distributions with unique positive median 
The required asymptotic normality follows from (12 3) with ;io“i 

®o"i 

Any reasonable lest should be consistent, so consistency does not 
provide a criteria for distinguishing among tests If a test is not consistent 
for a reasonable set of alternatives, it should be rejected as defective the 
next example illustrates a defective lest 


Example 1.3J. Let X^, . A*, be a random sample from a Cauchy 

distribution with density 






— 00 < 


< CO 


The characteristic function is given by ^0 = exp{ — 1/| -E i9l] 

Suppose, for testing Hq ff = 0 vmus 9 > 0 we reject //□ if X > c 
But the characteristic {unction of X ts 
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Hence X has exactly the sam^distribution as X„ independent of n. This 
means the power function of X does not depend on n, the power cannot 
tend to 1 for any 0 >0, and X does not provide a consistent test for any 
reasonable set of alternatives. 


1.4. A MOST POWERFUL TEST 

We next apply the Neyman-Pearson approach to construct a uniformly 
most powerful test for the one-sided alternative. At first glance it might 
appear surprising that an optimal test can be found in such a general 
setting. The fact is that there are very few size a tests for the composite null 
under consideration, so if one can be found, it may turn out to be optimal. 

The null hypothesis simply states that we are sampling from F E fig, and 
the alternative hypothesis states that we are sampling from G(x) = K(^x — 
0) where AT E Slg and 0 > 0. We use different notation for the null and 
alternative (F and K) to emphasize their composite nature. We can state 
the hypotheses in slightly different form: 

Ho:F(0) = i 

Any distribution function G can be decomposed in the following way: 
G{x) = P{X < X) 

= P{X <x\X< 0)F(A < 0) + P{X <x\X> 0)P{X > 0). 
Recall from (1.2.2) that p=l- K(-0) = 1 - G(0), and define 

^-(:c) = ^F(A<x|A<0) if x<0 

0 otherwise 

and similarly for (x). Now the density of G can be written 

= -F)^-W+F^+ W- (1.4.1) 

This allows us to isolate the median, the parameter to be tested, from the 
shape of the distribution and to formulate the hypotheses in terms of 



10 TIIE OSE'SAMPLE location MODEL 

Example 1.4 I. Let g(x) = 1/3 for -1< Jf < 2 and 0 otherwise, then 

C(x)-»0 x< - \ 

(x+l)/3 -t<x<2 

I 2 < X 

On -1 <x<0,P(X < x|2r <0)«x + I, andg,.(x)= I on - l<x<0 
and 0 otherwise Likewise, on 0<x<2, P(X < x\X >0)^^ x/2 and 
g^(x) = 1/2 on 0 < X < 2 and 0 othenvisc Now p = \ - G(0) = 2/3 
Hence g(x) can be thought of as two uniform densities pieced together and 
weighted by 2/3 

In general we can specify the distnbution being sampled by the tnple 
(p,g^.,g-), in the example we have (2/3, g, ,g_) The hypotheses 
become 


//, (/>.»,,*.). p>f 

where G and H arc arbitrary distribution functions possessing densities 

To apply the Neyman-Pearson lemma and construct the uniformly most 
powerful test, we tesocl to the method of least favorable distributions The 
strategy is outlined in the following four steps 

1. F» an alternative distribution and try to choose that distribution in 
the composite null hypothesis that is hardest to distinguish from the 
fixed alternative If the distnbulion is chosen correctly, it will be the 
least favorable distnbution 

2. Construct the Neyman-Pearson best size a test for this simple versus 
simple testing problem 

3. Show that the test remains size a on the composite null hypothesis 
If this IS not possible, then the least favorable distribution was 
probably not chosen properly 

4. The final step consists in showing that the lest does not depend on 
the fixed alternative, hence the test is uniformly most powerful We 
now carry out this strategy for the one-sample location problem 

1 Fix = ;j > 1/2 and fix A+ so that the alternative is , 
h _ ) The natural guess for a least favorable distribution is (1/2, A+ ,h ) 
By using and ft_ while setting f/2. we hope to make it most 



1.4. A MOST POWERFUL TEST 

difficult to separate the null from the alternative hypothesis. Hence we will 
try 

versus 

'{P ■>^^+ ) 

with p' > 1/2, / 2 + , /2_ all specified. 

Example 1.4.2. From Example 1.4.1, the anticipated least favorable distri- 
bution is given by 

^g_(x) + ig^(x)=i -l<x<0 

i 0<x<2 

0 otherwise 


2. The Neyman-Pearson lemma states that Hq should be rejected when 




1=1 L 


n [(1 

/ = ! 


< k. 


Suppose X(,-, is the /th order statistic and that for the sample, 

^(i) < • • • < ^(,) <0<X(,+ ,) < • • • < X, 

We can then rewrite (1.4.1) as 


'■(") • 


(i/2)”n^-(^(,))n^’+(^(o) 

I MJ 

( ' - pT n - (^(o) n ( yry ) + (^(o) 


< k, 


and hence as 
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Since p >1/2, after taking loganthms and letting k denote a generic 
constant we see the test is equivalent to rejecting Hg when S > k where S 
IS the sign test (I 2 1) 

3 The sign test is exact size a on the composite null hypothesis because 
of the distribution free property 

4 The sign test is uniformly most powerful because the critical region 
remains the same for alf p' > 1/2 and any other ,h^. ) 

Hence we conclude that if we can make no shape assumption about the 
underlying population disinbulion then the sign test is uniformly most 
powerful This means that for any fixed distnbution with positive median, 
there is no other size o test with greater power On the other hand, there are 
not very many size a tests for the composite null hypothesis considered 
here 

Definition 14 1 A sire a test of //g C S versus G £ fi,|, with 
cniical region V > k ts said to be unbiased if 

Pc{y>k)<a for all C7eC„„,i 

P^{y>k)>a forall 

The sign test for testing Hg versus 9 >0 with f 6 Oq « 
unbiased, see Exercise 183 In fact the sign test is unbiased for testing 
the two-sided hypotheses ftg 0 = 0 versus # 0 and is the uniformly 
most powerful unbiased test For a complete discussion of uniformly most 
powerful unbiased tests, see Lehmann (I9S9 p 147) 


IJ ESTIMATION 

In this section we describe how to derive point and interval estimates of 0 
from hypothesis tests We then apply the method to the sign test Since 
several of the nonparametne tests can be put into a sign test or counting 
form the results based on the sign test carry over immediately to these 
other procedures This allows us to derive the point and interval estimates 
based on the Wilcoxon signed rank test and the Mann-Whitney-Wilcoxon 
rank sum test with virtually no effort 

The ideas are more easily ratiodweed \sv terms o? the one sample t test 
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Let x„ . . . , x„ be the observed sample values and define 



s s 


where is the usual unbiased estimate of a^. We have written t(ff) as a 
linear function of 9. In Fig. 1.2 the graph of t(j9) is drawn on a 9, t axis 
system. The horizontal axis then represents the parameter space and the 
vertical axis the sample space of the test statistic. We have also drawn the 
null distribution of t(0), under the assumption of normality, on the vertical 
axis along with the critical points for the size a, two-sided test of Hq: 0 — 0 
versus 

The hypothesis test is carried out by checking to see where t{0) crosses 
the vertical axis relative to the critical points; hence, check the value t{0). 
The (1 — a) 100% confidence interval is found by inverting the acceptance 
region of the test. It is clearly marked by the interval on the horizontal axis 
from to §u. Clearly ^ < 0^) = Pg(\t(0)\ < = 1 - «■ Note 

that t(0J = and /(0y) = - so, for example, §^ = x + 

The principle that we use to determine the point estimate corresponding 
to t(S) is to select that value of 9 which corresponds to the point of 
symmetry of the null distribution of t(0). For reasonable tests this generally 
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corresponds to the mode of the null distribution of the test statistic In 
essence, we surest estimating 9 by that value which yields a test statistic 
value at the center of (he null distnbution, that is, a most likely value of the 
lest statistic For the case of r(4) Urn occurs for » 0, hence 

Hence all inference, both testing and estimation, can be linked together 
through this graphical representation We present the estimate in a more 
formal fashion later, however, Fig I 2 cames the heunsiic interpretation of 
these estimates 

For the sign lest, define S{9) •» 4t(Ar, > ), i * 1. . n This can also 

be written S(9 ) « > ^), » «» I. , n, where X^f, is the ith order 

statistic In Fig 1 3 we graph S{9} versus 9 for an even sample size n 

The null distnbution of 5(0). ihe binomial, is constructed on the vertical 
axis, or sample space Clearly, S(9) is a nonincreasing step function of 9 
which steps down at each order statistic Furthermore, the function S{$) is 
continuous from the nght We thus have the following inequalities 

^(*♦ 1 )^^ if and only if 5(#)<rt-A:-l 

^<•* 1 *-*) tfandonlyif Jt+l<5(ff) 

Hence 

•^(»+i) ^ ® ifandonlyif fc + 1 < 5(0) < n - k - 1. 

and It follows that 

1 ) < e < = F,(Jc + 1 < 5(0) d n - fc - 1) 

» 1 - P,(S(») < k) - P,{S(0) >n~k) 
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Note that P0(S(9) < k) = Po(S(P) < k). If we choose k such that P{S < k) 
- a/2 from the binomial table, then is a (1 — a) 100% 

confidence interval for 6, independent of F E fig- simplicity, we will 
generally use the closed interval interval still has 

(1 - a) 100% confidence by the continuity of the underlying distribution. 
Recall also that k can be approximated [see (1.2.6)] using the Central Limit 
Theorem. 

Since the point of symmetry for the binomial iiull distribution is n/2 
(when n is even) we seek a value 6 such that Sifi) — n/2. Any value 9 
between and will work and by convention we take 

fi 2r(„/2) + ^(n/2+l) 

9 = 2 ’ 

the median of the sample. 

When the sample size is odd, say n = 2r+ 1, the argument for the 
confidence interval remains the same. There are two middle bars in the 

A 

binomial histogram and the estimate 9 is seen to be the unique 

sample median (Exercise 1.8.5). 

Hence the sample median is the natural point estimate derived from the 
sign test, and a confidence interval construeted from the order statistics is 
the natural interval with its confidence coefficient given by the null 
distribution of the sign statistic. We now discuss a general formulation of 
the estimation problem due to Hodges and Lehmann (1963) and Lehmann 
(1963). 

Definition 1.5.1. Let X^, . . . , X„ be a random sample from F{x — 9), 
F E fig. Suppose F is a statistic for testing Hq:9 = 0 and define V{9) by 
replacing X^ hy X, — 9, i = I, . . . , n. Suppose that V(9) is a nonincreasing 
function of 9 and the null distribution of F = F(0) is symmetric about jUg, 
free of F. Define 


9* = sup{9 : V(9) > pg} 

0** = inf{0:F(0)<gg} (1.5.1) 

9* + 9** 

« — . 

The estimator 9 is called the Hodges— Lehmann estimator of 9. 

If, in addition, we define 

e2. = mf(0: F(0)< C,} 

et; = sup(6» : V{9) > C 2 } 


(1.5.2) 
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where P(V > C, )■=?{»'< C,)"a/2 Then the interval is a 

(1 - a) 100% confidence interval based on V 

In the sign lest example for n even, 0* ■ “ ■^(VJ+D’ 

whereas for n => 2r + 1 odd, 0* ■ 0** = „ The rank tests studied in 

this book are nonincreasing step functions of 0 with graphs similar to Fig 
The null distributions of the rank tests arc symmetric and nondecreas- 
ing {nonincreasing) on the left (right) of the point of symmetry The point 
or points of maximum probability are called modal points When n is even, 
n/2 IS the unique modal point of S, when n is odd, n - 2r+ 1, r and r+ 1 
are the two modal points The Hodges-Lehmann estimate corresponds to 
these modal points and can be loosely thought of as a maximum probabil- 
ity estimate relative to the distribution of the test statistic 

When the cntical values, C, and C 2 . are integers, we can define and 
§(j as the smallest and largest solutions such that C, - 1 and 

r(^„)-c,+ i 

We next present a theorem that shorn the estimation procedures are 
uanslatioft statistics, so. if wt add a constant to all the observations, we 
need only add that constant to the estimates This translation property 
allows us to let 0^0 without loss of generality m the study of the 
distributional properties of the estitnates 

Theorem 13.1 4 and are translation statistics, that is, 6{Xi + 

a. , X, + 0 ) ■ i(x^ , X,) + a. and likewise for 
Proof Let F, denote V computed on X| + o, . x, + 0 for some fixed 
a Then is computed on x, — 0 o x^~ 0 + a and hence 

V,(fl)«= V{0-a) Now, 

sup{9 “supl# K{P-a)>fio) -o + a 

- 5Up{^ - a V(0 - o) > po] + a 
"Sup(« F(5)>po} + a 

Hence + a, , + a) «» ,x,) + a A similar argument 

works for 0**, and 

Example 15.1 Recall from Example 1 1 I that expenments were con 
ducted m which the birds were taken out on lines at 90® and 180® from the 
training line and released Owe mjeavuernent. dI iwtetesL was tb* augla 
between the bird’s line of fbght, as it disappeared over the horizon, and the 
homing line Table 1 2 gives these angles for 28 birds released on a sunny 
day All angles arc measured between 0® and 180® so lines above or below 
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Table 1.2. Angular Errors for Birds Released on a 
Sunny Day (Artifidal Data) 

6, 1 , 9, 17, 18, 18, 22, 28, 32, 35, 36, 42, 42, 42, 48, 48, 
51, 52, 53, 55, 56, 57, 58, 63, 72, 83, 91, 97 


the homing line are not distinguished. Let 6 denote the population median 
error angle. We make no shape assumption on the underlying population of 
error angles. If the birds are not homing then 0 = 90°; hence we will test 
ffQ:ff = 90° versus <90°. Using the sign test and corresponding 

estimates, we begin the investigation. 

For testing Hq\0 = 90° versus ff^ :0< 90° we find that S = 2 where 
S = # (observations > 90°). For n = 28 we reject Hg at level « = .018 when 
S < 8. Since S = 2 we easily reject Hg at this level. The estimated error 
angle B is the median of the sample and 45°. A 91% confidence interval 
for 0 is given by = 135,53) since P(S < 9) =.045. Hence the 

sign test and companion estimates provide strong evidence that on sunny 
days the birds are systematically tending toward the homing line. The 
major portion of the study involved two sample comparisons in which birds 
were released on both sunny and cloudy days in order to see if the birds 
used the sun to navigate. See Exercise 1.8.11. 

If there are observations in the sample equal to the null hypothesized 
value they are set aside when calculating the sign test and the sample size is 
reduced accordingly. These values are not excluded for purposes of estima- 
tion. For a further discussion and other alternatives for handling this 
problem, see Lehmann (1975, p. 123). 


1.6. STABILITY 

Thus far we have discussed the statistical properties of the sign test. 
Robustness of statistical methods can be thou^t of as describing the 
stability of the procedure itself. In particular, we wish to avoid statistical 
Focedures that can be unduly influenced by a small fraction of the data 
the / test and its companion estimate X are examples of procedures that 

hrahlv ^ but are also 

highly unstable. Only one observation suffices to alter the values of / and X 

y an arbitranly large amount. The sign test and the sample median, on the 

wit^ possible to alter a few observations 

^ve desciJbeTore *is section 

m ? fonnally some aspects of stability and apply these ideas 

to the statistical methods considered above. 
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Definition I 6 1. Given an cstiroate 6 of 6 suppose there exists an integer a 
such that 

(a) 

(b) For any fixed values of *(«>:>. »^i»)* 

(c) For any fixed values of -*^( 1)1 . X(ii.a-i)> if -*^( 11 - 0 ) 


If a* vs the smallest such mtcBcr, then 4 can tolerate at most a* bad 
observations The tolerance is defined as 



The idea of tolerance was introduced by Hodges (1967) and is related to 
the idea of breakdown of an estimator For a discussion of breakdown, see 
Hampel (1974) and Huber (1981) In many cases. Iimr, - r, and the 
asymptotic tolerance vs the breakdown value All other things being equal, 
we would like 10 have an estimator with high tolerance 

Example I 6,1 The sample mean X has 0 tolerance because ^(V) < ^ 
< A'{*, , so that o* + I « 1 and o* • 0 If n - 2r, the tolerance of the 
median is (r - l)/2r, and if n ■■ 2r + I, the tolerance is r/(2r + 1) and m 
either case the asymptotic tolerance t •• 5 The o-inmmcd mean is ihe 
mean of the middle n - 2(<inJ observations after (an] observations have 
been trimmed from both ends of the ordered sample Since o* = (a/ij the 
asymptotic tolerance of the Cnmmed mean is r » a 

Definition 162. Suppose F b a test statistic such that we reject Hq 9 =* 0 
for $ >0 when V > k Suppose there exists an integer a such that for 
given, fixed values of , x„ values of x,, , x,+i can be chosen 

to force V <k Let a* be the smallest such integer then the tolerance to 
acceptance is defined to be 


T,(accepl) = — 

Since V cannot be forced to accept with a* or fewer observations, it can 
tolerate at most a* bad observations Furthermore, for a given sample, we 
can control V if we can alter o' + 1 samii^e values 

Similarly the tolerance to rqecbon is defined by the smallest integer b’ 
such that, for any fixed values of we can choose values 

•* 1 * > ^b'* I lo force F > it TTieii T,(rcject) »» b*/n A similar definition 
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of testing tolerance is given by Ylvisacker (1977) and discussed further by 
Rieder (1982). 

Example 1-6.2. The sign test rejects Hq'.B = 0 tor H^:0 >0 when S > k. 
If we fbc n - k + 1 observations less than zero, then no matter what the 
values of the remaining observations, it follows that S < k. Hence a* 
= n- k and r„ (accept) = (n~ k)/n. Substitute the approximate value of k 
from (1.2.7) to get 

, ... 1 7 - 1 . 1 

r.(accepl)-j-Z._-5^ 

T (accept) = ltmT„ (accept) = ~ . 

Hence the tolerance to acceptance of the sign test depends on the signifi- 
cance level a and converges to the tolerance of the median from below. See 
Exercises 1.8.6 for (reject). 

In Exercise 1.8.7 it is pointed out that for the t statistic (accept) = 0. 
Ylvisacker (1977) shows that t„ ( reject) = [k^/{n + k'^)] — l/«, where k is 
the cntical value of the i test. 

Just as in the case with estimation, it is desirable to have tests with high 
tolerance to both acceptance and rejection. Then the test will not be 
controlled by a small fraction of the data. Table 1.3 lists some numerical 
values of the tolerances of the sign and t tests. 

We next consider the effect on the estimate of tossing a single observa- 
tion into the sample. 

Definition 1.63. If the estimate is based on a sample of n observations 
and an additional observation with value x is introduced, the differential 


Table 13. Tolerance of Sign and t Tests, 
a =.05 


n 


(accept) 

I’m 

(reject) 

t 

S 

t 

5 

10 

0 

.20 

.15 

.70 

13 

0 

.23 

.11 

.69 

18 

0 

.28 

.08 

.67 

30 

0 

.33 

.06 

.63 

100 

0 

.40 

.02 

.59 

00 

0 

.50 

0 

.50 
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effect on the estimate is measured by the sensitivity curve 


scw-(»+i)(4„-i?,) 


The Princeton robustness study (Andrews el al, 1972) included a styl- 
ized version of the sensilivi^ curve Nineteen expected normal order 
statistics were taken as the base “sample** and the additional x was varied 
to determine the stylized sensitmty See Section 5E of the Princeton study 
for the graphs of stylized sensitivity curves 

Hampel (1974) points out that when the sensitivity curve is properly 
normalized, in the limit it corresponds to the influence curve The influence 
curve measures the influence on the population charactenstic being esti- 
mated (e g , population mean or median) of a contaminating point mass in 
the underlying distribution 

The definition of the influence curve for an estimator depends on being 
able to represent the estimator as a functional evaluated at the empirical 
cumulative distribution function (cdO For example, the mean and median 
of a cdf, H(,x), can be written ■ (xh(,x)dx « jxdfiix) and AfiH) 
^ II '(J) See the Appendix for a discussion of the Slielljes integral 
/g(x)d//(x) We suppose the mean A(H) exists and when the median is 
not unique we will uke Af(/f) ■ inffx //(jr) > 1/2) The natural es- 
timates of A(/f) and \t(H) are ^(//,)«» /x<///,(x) - n the sample 
mean, and ■ H~'(l/2). the sample median 

In order to measure the influence of a point mass at y, Hampel (1974) 
suggests evaluating the Gateaux derivative of the functional T{H) m the 
direction of a cdf that assigns probability I to the value/ This denvative is 
just the ordinary derivative of T(H,) with respect to the real vanable r 
evaluated at r * 0, where //,(a) = (I - OH(x) + lS^{x), 6^{x) = 0 if x </, 
and I if x>y Sec Huber (1981) for more discussion It measures how 
quickly the functional changes m the direction of contamination 

Definition 164 The influence curve for the functional Tf ) is 


From the point of view of robustness it is desirable to have estimates 
with bounded sensitivity and influence curves This means that a single 
observation cannot have aa atfailcanly largt effect oa the estimate Awatbet 
desirable property, perhaps better reflected m the influence curve, is conti- 
nuity Jumps or discontmuities m the sensitivity or influence curves indicate 



1 . 6 . STABILITY 


21 


local instability at the jump point. For example, an estimate with such a 
sensitivity curve may be adversely effected by round-off error at these 
points. 


Example 1.63. Without loss of generality, suppose we have a sample of n 
observations, and by good fortune the sample mean and median are both 0. 
It is easy to verify that the sample mean has SC^x) = x. Hence the 
sensitivity is linear and unbounded. This is just another way of saying that 
a single observation can change the sample mean by an arbitrarily large 
amount. 

Now suppose « = 2r, even, and < • • • < x^^■) < 0 < < • • • 

< . The sensitivity curve for the median is given by 


SC(x)=(n -(- 1)X(,.) , 

(n + l)x, 

(n+ 1)X(,+ ,), 

and the graph is given in Fig. 1.4. 


X < X(,) 

■^(r) < < ^(r+l) 

^(r+l) < 


Example 1.6.4. Let H,{x) = (I - t)H(x) + tSJx), and then AiH,) = 
!xdH,(x) = jxd[{\ - t)H{x) + /8^,(x:)]. Hence 

^xd8y{x)- jxdH{x) 




SC(xJ 
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Hence the mean functional A(//) has an unbounded, linear influence 
curve, similar to the sensitivity curve In facL if we lake the mean of H( ) 
to be 0, then ■“>' 

Now. note that 1/2 -//,(//, "'(i))- + 'I “ 
H(f/~'(j))] Hence, if h( ) is the probability density function (pdO of 
//( ) we have 



Using the definition of S^( ) and supposing Hi ) has median 0 without loss 
of geMralrty, w« have 


«(>•) 


I 

2H0) 

I 

2A(0) 


y < 0 
y>0 


This can be compared to the sensitivity curve in Fig 1 4 The mriuence 
curve js the limiting case of the scmilivity curve and has a jump 
discontinuity at the population median Both sensitivity and influence 
curves are bounded, showing the extreme insensitivity of the median to 
outliers 


In cases where the estimator is a lunctional r( ) evaluated at the 
empirical cdf, we have ^ => T{F„) and 9 = T(F) The influence function 
n(j>) often provides a representation suggesting the asymptotic distnbution 
of the estimator Huber (1981, Section 2 5) points out that under regulanty 
conditions, 

^r(r(f.)-7-(f))-i 2!)«) + „(i) (UI) 

in i-i 

where o^(l) tends to 0 m probability The Central Limit Theorem applies to 
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the first term on the right side; and, since -6) = n^/\T(F„) - 

T{F)), we have 

v;r(0 -0) " a^(x)dF(x)), (1.6.2) 

when the influence function is centered so that J“5oS2(x)rfF(x) = 0. 

A rigorous development requires a careful analysis of the remainder 
term o (1) for each estimator. However, (1.6.2) is an excellent heuristic 
guide which generally anticipates the correct asymptotic distribution. From 
Example 1.6.4, if the sample comes from H{x — 6), the influence curve for 
the median yields JQ,\x)dH(x) = \/[4h\0)\. Hence we would at least need 
the density existing and positive at 0. Then (1.6.2) suggests that if 6 
= medA", then 



For an extensive discussion of this approach see Serfling (1980, Chapter 6). 


1.7. SUMMARY AND EFFECT OF DEPENDENCE IN DATA 

If we suppose that our samples come from an absolutely continuous 
distribution with a unique median 6, (1.1.2), then the simple sign test, 
(1.2.1), is the uniformly most powerful size a test of Hq:9 = 0 versus 
>0. The distribution theory under both null and alternative hypoth- 
eses is given by the binomial distribution. Under //q ; 0 = 0, the distribution 
of the sign test does not depend on the distribution sampled, that is, it is 
distribution free. The sign statistic is expressible as the sum of i.i.d. random 
variables under both hypotheses, and the Central Limit Theorem can be 
applied to approximate needed probabilities and critical values. In addition 
to being optimal, the sign test is consistent and unbiased for any alternative 
distribution with nonzero median. It also has positive tolerance to both 
acceptance and rejection and is thus not unduly effected by a small portion 
of the data. 

Natural point and interval estimates can be easily derived from the sign 
test. The point estimate is the sample median and the interval estimate is 
defined by appropriate order statistics. The confidence coefficient is deter- 
mined by the null binomial distribution of the sign test. The median is a 
robust estimate; it has positive tolerance and bounded sensitivity and 
influence curves. In Section 6.2 the sign test is extended to the one-sample 
multivariate location model. 
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Throughout this chapter wc have assumed that the data consists of 
indepetidenl ohser/alions from a locatum model Most of the properties of 
the sign statistic depend quite heavily on the independence assumption To 
illustrate what happens when independence fails, we consider a simple 
senal correlation model Xi,X 2 , are normally distnbuted with mean 6, 
variance 1, and correlation — p ify = i + 1 and 0 otherwise The correla- 
tion p is restncted to — 5 < p < 5, see Schcffi (1959, Section 10 2) Hence 
only immediate neighbors arc correlated We further suppose that for 
I = 1,2, , (ATj.Jfj*,) has a bivanate normal distnbution with means 6, 

variances I, and correlation p If we did not suspect senal correlation m the 
data, for large n a nominal S% test based on S would reject Hq = 0 in 
favor of 8 > 0 if S > n/2 + \ 645(ii)*^V2 Here we have used (1 2 6) 
and Ignored the continuity correction for large n 

Under the null hypothesis 8 = 0 and assuming nonzero senal corre- 
lation we have 

£s-£2;,(*',) 

VarS-£{2W£,)-i))’ (171) 

- a V,r(,(£,)) + 2(« - l)Cov(,( AT, ).,(£,)) 
-f+7("-l){£(£,>0.A:,>0)-lj, 

since VarjfaVj) = £(j{Ar,))*- J — J =*i Hence the VariS is altered 
and includes f(Xf > O.Xj > 0) which shows that S is no longer distribution 
free under Hq 

In Exercise 1814 you arc asked to show that f(X,>0,X2>0) 
= l/4 + (l/2e)sin“'p Hencewehave 

VarS= J+^^sm-'p (172) 

In Exercise 1 8 15 you are asked to venfy the conditions of Theorem AI6 
and hence show that (S - f5)/(VarS)'^^ is limiting n(0 1} 

We can now approximate the Inie level of the test and compare it to the 
nominal 5% value which we assumed to be true Let Oj- denote the true 
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signficance level, then for large n. 


= 1 



1.645V^/2 \ 
VVar“5 J 


l-4» 


1.645 


+ (4/7r)sin 'p j 


(1.7.3) 


Again, ignoring the serial correlation, the nominal 5% test based on the 
sample mean rejects Ho'. 0 = 0 when >1:^5. Under ^^-. 9 = 0 and 
assuming the serial correlation model, £(n''^A') = 0, Var(« AT) - 1 + 
2(« - I)p/« and is normally distributed since it is a linear combina- 
tion of normally distributed random variables. The true significance level 
is a-j-, 


ar=P{^X > 1.645) 


= P 




1.645 


Y*» 'A 

- -T - ^ 

■^1 + 2(n — l)p/ n + 2(n ~ \)p/ n 


= 1 


1.645 

tJ] + 2p 


(1.7.4) 


In Table 1.4 we record the true significance level, for various values of p, 
for nominal 5% sign and X tests. 


Table 1.4. True Significance Level for 5% S and X Tests 


Test 






P 






-.49 

-.4 

-.3 

-.2 

-.1 

0 

.1 

.2 

.3 

.4 

.49 

X 

.000 

.000 

.005 

.017 

.033 

.05 

.067 

.082 

.097 

.109 

.121 

S 

.003 

.009 

.018 

.028 

.039 

.05 

.061 

.071 

.081 

.092 

.100 


.000 

.000 

.006 

.018 

.033 

.05 

.067 

.081 

.095 

.107 

.119 


T is the Wilcoxon signed rank test; see (2.7.12). 
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Hence both tests have true levels that can be far from the nominal level 
The sign test not only is no longer disMbution free under IIq but is effected 
almost but not quite as much as the X lest When the correlation is positive, 
we tend to have “too many” rejections since large observations pull others 
up with them Hence, when the tests reject H(, about 10% ot the time 
rather than the expected S% The moral is clear even the most robust 
procedures may be highly unreliable under the most simple models of 
dependence m the data Gastwirth and Rubm (1971) reach the same 
conclusion on more general autoregressive models The same authors, m 
1975, studied the behavior of robust estimators for models with dependent 
data 


1,8. EXERCISES 

1,8 1. Construct a distribution that has many medians 
1.8J, Forn- 10, a - 0547 

a Construct the sign test for 0 versus //,< 6>0 
b Find the exact power for /i(l, 1) and double exponential with 
/(x)-2''expl-lx~ll} 

c Find the approximate power at n(), )) using the nomal ap- 
proximation with continuity correction 
d Suppose F IS n(I, I) and find the power of the test based on X 
1 8J. Prove that the sign test is strictly unbiased for Mq 0 = 0 versus 
0 >0, F eHo, that is, show that for admissible values of a, 
> k) = a and P^iS > k)> a for any G m Hint Use 
the btffa integral representation of the binomial distnbution 

1.8 4, Suppose V rejects 0 = 0 versus 0>0 when V >k Let 

y(6) be y computed on ;ir, - .,X^-0 U Ph^( y>k) = a 

and IS nonincreasing in ff, then prove that Pq(V > k)> a 
for all G in n,|, Hence V provides an unbiased lest 

1,8,5, Construct the graph of S(0) for n odd and indicate the point and 
interval estimates 

1.8 6. Find the tolerance to lejecbon for the sign test and show how it 

converges, as n increases, to the tolerance of the median 
1 8.7. Show that the t lest has 0 tolerance to acceptance 

1.8 8 Construct the sensitivity curve for the median when n is odd 



1.8. EXERCISES 

1.8.9. Construct the sensitivity curve for the trimmed mean defined in 
Example 1.6.1. 

1.8.10. Suppose F is symmetric about 0 and define 

r(F) = (l -2a)"' r‘“V"'(0^6 

then r(F) = 0, the point of symmetry. The a trimmed mean 
Show the influence curve is given by 


fi(y) = \ 


(l-2a)-‘F-'(a), 

(1 -2a)"'y, 
(I-2a)-'F-'(l-a), 


y < F-'(a) 

F-'(a) < y < F-'(l - a) 
y>F-'(l-a). 


Hence when sampling from F(x — 0), F G by (1.6.2) we have 
0) is asymptotically normally distributed with mean 0 
and variance; 


a 


2 


1 

( 1 - 20 )" 




where b = — a = F~*(a), and % defined in 2.1.1. 

1.8.11. In Example 1.5.1, error angle measurements were given for 28 
birds released on sunny days. The following data was taken on 13 
birds released on cloudy days (error angle in degrees): 8, 10, 38, 
43, 45, 57, 73, 76, 83, 105, 112, 126, 141. Construct an approximate 
5% sign test for HqiO = 90° versus < 90° and carry out the 
test on the data. Also construct the point estimate of 0 and an 
approximate 90% confidence interval for 0. As in Example 1.5.1, 0 
represents the population median error angle. 

1.8.12. Define S* — #(X, > 0) — #(X, < 0). Find the mean, variance, 
and distribution of 5*. Let sgn(x) = 1, 0, or - 1 as x > 0, = 0, or 
< 0, respectively, then S* = 2iSgn(A",). Find the relationship be- 
tween S and 5*. Find the limiting distribution of S* and describe 
how to construct a confidence interval for 0 based on S*. 

1.8.13. Suppose A',,A' 2 , . . . , are independent observations such that 
X, has cdf F(x — 0,) where F is symmetric about 0. For testing 
//o- 0] = • • • = 02 „ versus ; 0, < • • ■ < 02 „ with at least one 
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stncl inequality, we consider the Cox-Stuart (1955) sign test for 
trend 

5 = ^ j(A', - Xi ), j:) - 1 if X > 0 and 0 otherwise 

a Define p, = > X,) aodq, “l-pi Lei 

Z, = s{X„*, - X,) — Pi and use Theorem A6 to prove that 




provided is a divergent senes 

b Discuss the small sample and asymptotic distnbution of S 
under Wo 

c Show (hat S^S/n provides a consistent test provided 
{l/(ny''*]2‘i(p. - l/2)-» + 00 In panieulai. show the test is 
consistent if “A>0 for all . 2n and 

n»l,2, Hmt Use the asymptotic normahty under alter- 
natives given in part (a) 

1,8.14 Suppose (X, Y) has a bivanatc normal distnbulion with means 0, 
variances 1, and correlation p Show that P(X <0, ^ < 0) " /’(X 
> 0, T > 0)*= ^ + (l/2«)sin V Hmt Transform to polar coordi- 
nates 

1.8.15. Suppose Xj.Xj, are normally distributed with mean 0. vari- 
ance 1, and correlation confX, X) = p if j = i + I and 0 other- 
wise Show that Theorem A16 applies and that n'^^(S- 1/2) « 
asymptotically normally distributed with mean 0 and variance 
= 1/4 -(■ 2{/’(X, > 0. X,> 0) - 1/4}, where S= rt-'2j(X,) 




CHAPTER 2 


The One-Sample Location Model 
with a Symmetric, Continuous 
Distribution 


2.1. INTRODUCTION 

In the first chapter we found that if there is no shape assumption made on 
the underlying population, then the sign test is uniformly most powerful 
size a for a one-sided hypothesis about the median. Hence, without further 
assumptions, there is no reason to introduce additional statistical tests. If 
we make more restrictive assumptions, such as normality of the underlying 
population, then we can again seek optimal procedures. For the location 
problem, these normal model procedures are based on the mean. In this 
chapter, we stop short of normality and introduce the assumption of 
symmetry on the underlying population, then consider how this informa- 
tion can be used to develop additional statistical procedures, such as rank 
tests. 

Typically, the distribution theory for a rank test is more complicated 
than that for the sign test. Under the null hypothesis, the rank test statistic 
can be represented as a sum of independent but not identically distributed 
random variables. The independence is lost when we consider the alterna- 
tive hypothesis. This necessitates the presentation of theorems beyond the 
Central Limit Theorem to handle the asymptotics. We also introduce 
asymptotic efficiency as a means of comparing statistical procedures. 
Efficiency is a local measure and is strictly valid only in a neighborhood of 
the null hypothesis. Although not as informative as a power curve, effi- 
ciency is much easier to work with. Empirical studies are quoted to show 
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that in many practical situations the cffiaency results are relevant for small 
sample sues 

A powerful motivation for the symmetry assumption is provided by the 
paired data design presented m the following example 

Example 2.1.1. Suppose the paw of random vanablcs {T,C), representing 
a treatment and control response, respectively, have a joint distribution 
function F(i,e) Assume that the treatment and control are assigned 
independently and at random to the sut^ects of the experiment Then the 
null hypothesis of no difference between treatment and control implies that 
F(l.c)=r(c,l) 

Next, we introduce X •* T— C, the usual form for the data analysis 
This reduces the problem to a one-sample location model m which we 
formulate hypotheses concerning C(x) = F(X < x) 

Under the null hypothesis that F(r,c)*= F(c,0 it follows that P{T - C 
< x) = P(C- T < x)« P{-{T- C)< *) and hence that X and ~X 
have the same distnbution This means that under the null hypothesis, the 
difference Af ■ 7" - C is symmetncally distributed about 0 

If the alternative hypothesis specifies that the treatment effect adds a 
constant to the control then we will test //^ ^ 0 versus 0 >0 where 

9 IS the center of the sampled symmetric population of differences 

There may of coune be perfectly good reasons for assuming symmetry 
ol the underlying population distribution in the one-sample location model 
The paired data design just provides one source of such examples Another 
approach, advocated by many data analysts, is to transform the data 
However, the transformation is often selected on the basis of the data and 
this makes it difficult to mtcrprel the sigmljcance levels and confidence 
coefficients 

We now introduce the subclass of ft, of symmetric distributions, cen 
tered at 0 Let 


ft,= {F FGl^andF(x).= I -F(-x)} (211) 

Hence X (or F) is said to be symroetnc about 0 and 0 is the unique median 
and mean (when it exists) The sampling model is then given by 
Xx, ,X„, a random sample from F{k ~ 9), FeSl, Hence 9 is the 
unique median and mean (when it exists) at the center of the distribution 
In this sampling model, we assume that the expenmental effect is 
expenenced solely m a change of location This may not always be the case, 
and we will see later that if the effect is to introduce asymmetry, the rank 
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tests may still tend to reject the null hypothesis (Example 2.5.1). However, 
unless further qualified, the hypotheses under test are; 

Ho:e = Q 

: 9 >0 


with F E 2^. 

Example 2.1.2. Rosenzweig et al. (1972) describe experiments, carried out 
in the 1960s, to determine the effects of environment on brain anatomy. 
The hypothesis of such an effect can be traced back to an Italian anato- 
mist, Gaetano Malacame, working in the 1780s. In the more recent 
experiments three male rats from each of 12 litters are randomly assigned 
to a standard laboratory cage, an enriched cage containing a variety of 
toys, and an inpoverished environment in which the rats lived in isolation. 
Vanous measures of brain weight and enzymatic activity were taken. For 
this example, the weight gain of the cortex over a specific period of time 
was considered. If we compare rats in an impoverished environment to rats 
in an enriched environment, we have a paired-data experiment. The pairs 
are naturally formed by litter mates with the same genetic makeup. Let 
A’(y) denote the impoverished (enriched) measurement, then the basic 
random variable of interest is £) = Y ~ X. Under the null hypothesis of no 
difference in the effects due to the two environments, D has a distribution 
symmetric about 0. If we let 9 denote the center of the distribution of D, 
then the experiment yields 12 observations Z> ,,..., 72,2 to test H^: 9 = 0 
versus H^: 9 >0. Data for this example is presented in Example 2.3.1. 


2.2. THE WILCOXON SIGNED RANK TEST 

The sign test S = 2j(A)) uses only information in the sign of the observa- 
tion; no metric information on how far the observation is from zero is 
incorporated into the test. However, for a distribution that is symmetric 
about 0, the vector of absolute values of the observations is a sufficient 
statistic (see Lehmann, 1959, p. 56). Absolute value is just the distance from 
0; so, for symmetric distributions, it would seem reasonable to try to use 
this information. 

In general, the rank of a quantity Z, among Z„ . . . , Z„ is the number of 
items Z,^ < Z,, k = 1, . . . , n. Hence the rank of Z, is its position in the 
ordered set Z^,, < • • • < Z^„y 
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Returning to the sampling model Jt,. , AT, 1 1 d F(x - S), F e B,, we 
tank jXjl, , and define a staiisiic T as the sum of tanks of the 
positive sample items among the absolute values We will reject Hq 6 = 0 
in favor of 5 > 0 if T > *. where k is determined by the null 
distnbution of T 

The statutic T was proposed by Wilcoxon (1945) and is referred to as 
the Wilcoxon signed rank suiislic When S > 0 and ihe symmetric distnbu* 
tion IS shifted to the right, positive observations tend to be farther from 0 
than the negative observations Hence T tends to be large and rejects 
Hq ^ 0 In contrast to the sign statistic, T takes into account, through the 
ranks, the relative disunces from 0 

As mentioned previously, it may happen that the median is 0 but the 
distribution is skewed to the right An example is given in Fig 2 I From 
the figure it is easy to sec that T will tend to be large even though the 
median is 0 Hence the symmetry assumption is necessary for an unambigu- 
ous interpretation of large values of T If the population median is known, 
then T provides a test of symmetry 

Define 


0 


if \X\^fy corresponds to a positive observation 
otherwise. 


(2 2 1 ) 


< l^(iii> n™ *be ordered absolute values Then 


'T-tjO'rtWi) 

y-t y-| 


(2 22 ) 


where < 
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is the Wilcoxon signed rank statistic where Rj is the rank of |A^1. In order to 
study the distribution theory involving T we need further notation. 

Definition 2.2.1. If Rj is the rank of then Th® antirank 

is then defined by Dj such that \Xo) = \X\^J^ . Hence Dj labels the X which 
corresponds to the jth ordered absolute value. 

From the definition of Dj and from (2.2.1) it follows that 

Wj = s{X^^) (2.2.3) 


where j(x) = 1 if x > 0 and 0 otherwise. 

Theorem 2.2.1. Under the null hypothesis H^: 9 = 0, F s{X^), . . . , 
s{X„) and the vector (i?,, .... R„) are mutually independent. 

Proof. Since (j(2f,), jX,)), / = 1, . . . , n are independent pairs we first need 
to show that s(,X,) and [A',] are independent. Consider 

P{s{X,)^ l,\X,\ < x) = /-(O < AT, < X) 

= n^) - m 

= F(x) - 1 
= (I/2)(2F(x)-l) 

= P{s{X,) = 1) - F(|Ar,| < X). 

Similarly for /’(j(A',) = 0, |Af,| < ;>:). Hence s(Xj) and |Af,| are independent. 
Since (i?„ . . . , 7?„) is a function of |Ar,l, . . . , \X„\, the theorem follows. 

Just as the signs are independent of the ranks, they are also independent 
of the antiranks, and we have ^(A',), . . . , j(Ar„) and (Z)„ ...,£>„) are 
mutually independent. 

Theorem 22.2. Under the null hypothesis Hq-.O — O, Fen,, IF„ . . . , 
W„ are independent, identically distributed with 


F(IF;. = 0) = P(If;.= l) = ]/2 
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Proof Let Z> = (Z),. . , Dfi and = . . , </,). then using (2 2 3) and 

Theorem 2 2 I we have: 

/’(»{', = w„ . 

"( 1 / 2 )" 

Hence P( I^', » w,. . IP, - h.) « nr^( K " ’*'.)• ^( " '*’/) “ 1/2. 

and the theorem follows 

Hence is a linear eombmaiion of nd £(1,1/2) random 

vanables under //o Ten, This results in fairly straight forward 

distnbution theory under and shows (hat T is distribution free under 
//o Although a closed formula for the distribution of T is not available, the 
following example illustrates (he simplicity of the distnbution 

Example 22 1. For sample size 4, (he possible ranks of the absolute values 
are 1. 2, 3, 4 with the associated signs attached independently with 
probability 1/2 We list a few of the 2* « 16 configurations 


Ranks 

1 2 3 4 Value of r 


Signs 


10 



2.2. THE WILCOXON SIGNED RANK TEST 

Since each configuration of signs has probability 1/2'' =1/16, we can 
enumerate the probabilities for T. For example, P(r=10)=l/16, PiT 
= 6) = 2/16, and so on. If we agree to reject Hq \ 6 = 0 T > 9, then the 
size of the test is a = P(T > 9) = 1/8. 

The mean and variance of T, under Hq, are easily derived from Theorem 
2.2.2 as (see Exercise 2.10.1): 


£„r=«(n + l)/4 
Var„/= «(rt + l)(2rt + l)/24. 


(2.2.4) 


In fact, the moment-generating function for T is easy to derive from 
Theorem 2.2.2. 

Example 2.2.2. Under Hq\0 = 0, F the moment-generating function 
of T IS given by; 


^(0 = i n(I + ^'0- (2-2.5) 

^ y=i 

Proof. From the definition of M{t) and Theorem 2.2.2 we have 

M{t) = E{e’'^) 

= Ee'SyWl 

Now Ee'P''j = e°/2 + c'-'/2 = (1 + e‘J)/2, and the result follows. 

The moments, given by (2.2.4), could be derived from (2.2.5). Since the 
probability function for T cannot be given in closed form, the moment- 
generating function provides an alternative method for constructing the 
probabilities of T. Hence if M(t) = age’^' + o,e' + + • • • , then P(T 

~j) = 0 ^- In the following example we show how these probabilities can be 
developed in a systematic way. 


Example 2.23. We begin with n = 2, and thus 

M{() = (1/22)(1 + e')(l + e^') = {\/2\\ + e' -1- 4- e^<). 
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We now list the powen and the coefficients 


0 12 3 Powers 

1/4x1 I ! I Coefficients ' ' 

Thus, /’(r=0)= />(r- 1)« =/’(r=3)» 1/4 For « = 3. we have 

M{l) - (l/2')(l + »')(! + c”)(I + »>') 


and the powers and coefficients arc developed from (22 6) as 


0 1 2 3 I 4 S 6 Powers 
fill 

I 111 

1/8x1112 111 Coefficients 


(227) 


The vertical dividing line in the first line of (22 7) indicates where addi 
tional powers for the next sample size are added For /» • 4 we have 


A/(/)-(l/2‘)(l + 


(Iff*') 

Hence from (2 2 7) we have 



0 1 2 3 4 5 6 1 

7 8 

9 10 Powers 

1 1 1 2 1 1 1 
t 1 1 

2 1 

1 1 

1/16 X 1 1 I 2 2 2 2 

2 1 

I 1 Coefficients 

and, for example, /’(T® 7)- 2/16 = 

1/8 



The pattern of this example should now be obvious A computer can 
easily be programmed to develop tables of the distribution of T under //q 
F urther, the example provides evidence for the symmetry of the distribu- 
tion of T (see Exercise 2 102) A recurrence formula for the distnbuUon of 
T is given in Exercise 3 73 


As in the case of the sign test, when the sample size is large or when a 
table of the distnbulion is not available, it \s important to have a normal 
approximation for the distnbution of T under Hq From the independence 
of IF,, , IF, Theorem 222 we can still rely on central limit theory 
However, r is a linear function of IF, , and the usual Central 
Limit Theorem must be extended The extension that we use is proved in 
Theorem A9 of the Appendix In Exercise 2 10 4 you arc asked to apply 
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Theorem A9 to T and show that 


7’-n(w+ I)/4 

I n(n + \){2n + I) 
24 


has a standard normal limiting distribution. Hence if we reject Hq:0 = 0 
for >0 when T > c, then the size a critical value can be approxi- 

mated (with continuity correction) by 


c~ 


n[n + 1) 


+ .5 + Z. 




-t- 1)(2« + 1) 


24 


(2.2.9) 


where Z„ is the upper a percentile of the standard normal distribution. 

The accuracy of the approximation is usually increased by using an 
Edgeworth approximation; see Cramer (1946, Sections 12.6 and 17.7). If V 
has a symmetric distribution, the approximation is given by 


E( F < 0) ^ ^() - (2-2.10) 

where X 4 = (£(F - £V)''/[E(V - EVff) - 3, called the excess, is a func- 
tion of the kurtosis, xf'(*) is the standard normal pdf, and t = {v — EV)f 
(Var (If V is discrete, t usually contains a correction for continuity.) 
For the Wilcoxon signed rank statistic, Fellingham and Stoker (1964) show 


P{T < k) = 4>(r) + 


(3n^ -I- 3n — 1) 
10n(n -t- l)(2n + 1) 




( 2 . 2 . 11 ) 


where / = (k + .5 — £r)/(Var Example 2.2.2 can be used to find the 
necessary moments. Bickel (1974) has shown this is a rigorous asymptotic 
expansion taken to terms of order n“ '. See Table 2.1. 


Table 2.1. Normal and Edgeworth Approximations to the Null 
Distribution of T, with Continuity Correction and n = 5 






k 





0 

1 

2 

3 

4 

5 

6 

7 

P(T<k) 

.031 

.062 

.094 

.156 

.219 

.312 

.406 

.500 

Normal 

.029 

.053 

.089 

.140 

.212 

.295 

.394 

.500 

Edgeworth 

.027 

.055 

.096 

.152 

.227 

.309 

.402 

.500 
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There are vanous other approaches for establishing the limiting normal 
ity of T Several are based on the moments or moment-generating function 
See the papers by Haigh (1971) and Noether (1970) 


2 J. POINT AND INTERVAL ESTIMATES BASED ON THE 
WILCOXON SIGNED RANK STATISTIC 

Recall Section I 5 in which we defined, for the sign test, S{fl) ■» > 5, 

<b], ,n The Hodges-Lehmann estimate and the interval estimate of 0 
based on S then follow easily from the graph of S{0), which is a nonin 
creasing step function with steps at the ordered sample values In this 
section we derive a counting form for the Wilcoxon signed rank statistic 
which enables us to easily construct the conesponding point and interval 
estimates 

Definition 2J 1. Given a random sample X,, , T, the n(n + l)/2 

Walsh averages are defined by (X, + T,)/2. i < y This name was given to 
the pairwise averages by John Tukey in reference to work of John Walsh 
Tukey (1949) then went on to develop the following representation 

Theorem 2J.1 The Wilcoxon signed rank statistic T defined m (2 2 2) can 
be wniten as 


( X. + X, 


^> 0 . 




Hence T is the number of positive Walsh averages 

Proof Let Xf , , denote the p positive sample items, then T is the 

sum of ranks of these items among the absolute values 

Draw a circle with center at the origin and radius X,^, as in Fig 22 
Then the rank of X,^ is equal to the number of sample points in the circle 



Figure 2J. Counting positive Walsh averages 
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including X,^ since we are ranking the distance to 0. Every average formed 
by a sample' point in the circle and X,^ is positive. Hence the rank of X,^ is 
just the number of positive Walsh averages formed by X,^ and sample 
points less than or equal to If this procedure is repeated systematically 
for / 2 , . • ■ ,ip then the sum of ranks is seen to be identical to the number of 
positive Walsh averages. 

We now see that 

i<j, (2.3.1) 

is a nonincreasing step function with steps at the Walsh averages. It follows 
at once from Definition 1.4.1 and Exercise 1.8.4 that T provides an 
unbiased test of /^o : 0 = 0 versus : ^ > 0, F E fij . 

From the signlike or counting structure of T, and the symmetry of its 
null distribution (Exercise 2.10.2) we can at once write down the Hodges- 
Lehmann estimate of 6 (see Section 1.5) as 

( X, + X,\ 

e = med / — ^ — j, (2.3.2) 

the median of the Walsh averages. Further, if < • • • < N 

= «(« + l)/2, are the ordered Walsh averages and P{T < a) — a/2 = P{T 

> N — a), then 


+ (2.3.3) 

is the (1 — a) 100% confidence interval for 6 based on T. From (2.2.9), a 
can be approximated (with continuity correction) by 

^^«(/j+l) ^ ^ _ /«(n + l)(2n + 1) 

4 24 

If n is of moderate size, the number of Walsh averages will be quite large. It 
is then not practical to compute the estimate or confidence interval by 
hand. The Minitab statistical computing system contains the commands 
WTEST and WINT which provide the Wilcoxon signed rank test, point 
estimate, and confidence interval. 

The next theorem shows that under most sampling situations 0 is an 
unbiased estimate of 0. Later in Theorem 2.6.5, we wUl show that 0 is 
approximately normally distributed. 
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Theorem 23 2 I( f eSl, then the distnbution of 4 is symmetnc about 6 
Proof We first observe that -«►<*)- PffS < jr) by Theorem I 5 I 
Hence we take 9*^0 without loss of generality Now X = (,X, , X„) and 

- A' = ( - A*! - AT,) have the same joint d stnbution since F G SI, If 

we wnte ^(X) for Tned(X, + X^>/2 KJ then tf(X> and ^(- X) have the 
same distribution and ~4(X) implies that i and ~0 have the 

same distnbution Hence the theorem follows 

One drawback of the median of the Walsh averages is the large number 
of Walsh averages that must be computed and ordered For example if 
n = 50 then there are n(B + l)/2«* 1275 Walsh averages Estimates such 
as the sample median and Gallon s estimate (see Exercise 2 10 7) require 
less computation and have positive tolerance If they have good statistical 
efficiency properties they could provide an attractive alternative to the 
median of the entire set of Walsh averages We discuss the statistical 
properties in later sections of the chapter 

In summary when sampling from a symmetnc distnbution with point of 
symmetry 6 T the sum of ranks of (he positive items among the absolute 
values provides a natural lest of ^ 0 versus //^ $ >0 This test 

utilizes information in the symmetry by ranking the distances of the 
observations from the origin The lest is distribution free under A/g and the 
underlying hope is (hat it provides more power for detecting symmetnc 
shift alternatives than (he sign (est This problem of power and efficiency is 
taken up in Sections 2 5 and 2 6 We have further seen that, just as in the 
Sign test case natural point and interval estimates based on T are available 
Under the model F £ fl, ^ is an unbiased estimate of 0 and T provides an 
unbiased lest of Hq 9 = 0 versus 11^ 9 >0 In the next section we 
consider the stability properties of the Wilcoxon procedures 

Example 23 1 In Example 2 1 2 we cons dered littermate rats randomly 
assigned to an ennehed or tmpovenshed environmenL The measurement 
taken was the weight m milligrams of the cortex after a fixed penod of 
time There were 12 pairs and (he data is given in Table 2 2 We suppose 
that the distribution of differences is symmetrically distnbuted about 6 and 

Table 22 Cortex Weight (mg) (Artifkbl Data) 

Pair 

Environment I 2 3 4 S 6 7 8 9 10 II 12 

Ennehed 689 663 653 740 699 690 685 718 742 651 687 679 

Impoverished 657 646 642 650 698 621 647 689 652 661 612 678 

Difference 32 17 It 90 1 69 38 29 90 - 10 75 I 



2 . 4 . STABILITY PROPERTIES OF RANK TESTS AND ESTIMATES 


41 


we wish to test Ho'. 9 = 0 versus H^:e>0 based on the 12 observed 
differences. If the significance level is taken to be a = .01, then we reject 
/fo ; 0 = 0 if r > 68 [the normal approximation, (2.2.9), yields T > 70]. The 
observed value of T is 75; hence we reject HqiO = 0 and claim, at a = .01, 
that the cortex weight of rats raised in an enriched environment is signifi- 
cantly greater than that of rats raised in an impoverished environment. The 
point estimate of 0 is 0 = 36.5, the median of the 78 Walsh averages. A 
94.5% confidence interval for 0 is determined by the 15th and the 64th 
ordered Walsh averages: [1 1.0, 59.5), 

In practice it is often necessary to deal with zeros (observations that are 
equal to the null hypothesized value) and with ties among the absolute 
values. As in the case of the sign test, we set aside the zeros and reduce the 
sample size accordingly before computing the Wilcoxon signed rank test. 
Zeros are not excluded for the purposes of estimation. Observations that 
have the same absolute value are assigned the average rank for that set of 
observations. These midranks are then used in the calculation of T. This 
corresponds to counting a zero Walsh average as 1/2. There are other 
methods for handling ties and none is superior to the others in all situa- 
tions. The midrank method is the one most commonly used in practice; 
Conover (1973) discusses and compares several of the methods. See also 
Lehmann (1975) and Pratt (1959). Noether (1967) shows that, when the 
underlying population is discrete, the closed confidence interval always has 
confidence coefficient bounded below by the stated confidence coefficient. 


2.4. STABILITY PROPERTIES OF RANK TESTS AND ESTIMATES 

In Exercise 2.10.5 you are asked to compute the testing tolerance of T to 
both acceptance and rejection and to show that the asymptotic tolerance in 
both cases is .29. Hence T provides a test with positive tolerance, between 
the highly tolerant sign test and the intolerant t test. 

We now turn to a general study of tolerance of estimators which are 
constructed from the Walsh averages in a special way (see Hodges, 1967). 
Note that the sample median is the median of the set of Walsh averages 
restricted by i = j (see Definition 2.3.1). 

Let 5 be a subset of indices (/,/) contained in the set : i < j) and 
define the estimate 0 by 


§= nred 

2 


(2.4.1) 


where A'j,) < ■ • . < are the ordered sample values. For example. 
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1 2 3 ... " 

Figure 13 Index tclt lot cslimatM 


^"{('.y) I =y) defines the sample median, and B • {(/,y) f < y) de- 
fines the median of the Walsh averages Another estimate, attnbuted to 
GaUoft by liodges (1967), is defined, by i-fy^n + l) A 

CQwmenV 'way \<5 v\s\sa\ae \b« indeR seL S is given v^ Fife The *ei 
{{*>y') * < y} IS represented by the lattice points m the upper triangle The 
sets Bi and Bj, defining the sample median and Gallon’s estimate, respec* 
lively, are shown Other estimates are illustrated later 

If the pair 0,J) tn B implies that (n -y + l.n - » -i- 1) is also m B, the 
set B IS said to be symmetric The following theorem shows that a 
symmetric index set produces unbiased estimates of 6 when sampling from 
a symmetric population 

Theorem 2 4 1. If the sampling model is given by F(x - 6) with F BS, 
and if B ts symmetric, then the estimate (24 1), is symmetncally 
dislnbuted about $ 

Proof Without loss of generality lake B — 0 From the form of the joint 
density of pairs of order statistics, il is easy to see that (yfio* and 
(-yf,,_^+t)« “ 1 + 1 )) have the same distribution Hence (?(;C) based on 

X = (X,. '.X„) has the same distnbution as tf(-X) But tf(-A’)=» 

- ^(X), hence $ and have the same distnbution, and 4 is symmetri- 
cally distnbuted about 0 

TTieorem 2.4J. Suppose B is symmetnc Then the tolerance of (2 4 1), is 
o/n, where a is the largest integer such that 

(2 4 2) 
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where K, - :t<i< j) and # denotes the number of elements in the 

set. 

Prooj. We will prove the result for #B = 2q, even. The odd case is left as 
Exercise 2.10.6. 

Now a is the largest integer such that 

(i)[2^+l] = 9+l/2< 

Hence q+ 1 < #Ka+v 'Thi® shows that exceeds half the #B. 

Further #K^+ 2 ^ ? because of the maximal property of a. 

If {i,j) e Ka+\, then a + 1 < / < j and 

y ^(.) + 

(ii+ 1) 2 

Hence more than half of the Walsh averages defined by B are bounded 
below by and it follows that 

as required in Definition 1.6.1. 

Now fix + . . . ,X(„) and let — oo. If / < n + 1, then as 

^(0 ^(.J) ^ '’^( 0 + 1 ) ^(j) , 

2 < 2 

Since = {i.Uj ) ; a + 2 < i < j), i < a + \ implies that (i,y) is in the 
complement of Kg+ 2 - Hence all Walsh averages defined by the complement 
of K ^^.2 tend to -oo. We have seen that #K ^+2 < 95 so the number of 
elements in the complement of K ^+2 is at least ^ + 1, more than half the 
#5. This means that 0 is contained in the complement of K^+ 2 , and hence 
- oo as — 00 , as required by Definition 1.6.1. 

The rest of the conditions in Definition 1.6.1 follow from the symmetry 
of 5. 

Example 2.4.1. Let B= {(,i,j):i < j), so that 0 is the median of the 
Walsh averages, (2.3.2). In this case # B = n{n + l)/2. The set = 
{{‘,j):a + 1 < / < j}, and hence #/sr„+, = (n - a) + (n - « - 1) + . . . 
+ ! = («- a)(n - a + l)/2. The theorem shows that we must find the 
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largest a such that 

This reduces to the quadratic inequality 

a* - (2n + 1)0 + (n* + fl - 2)/2 > 0. 

and a is the greatest integer in 

2n+ 1 - Y(2n* + 2n + 5) 

2 


(2 4J) 


(24 4 ) 


The other solution of the quadratic yields a value of a outside the range 1 
to n If we disregard the noninteger character of (24 4), the tolerance 
(Definition 16 1) can be written as 




and 


t - limr, - I - -L - 29 

VI 

We see, as with the sign lest and the sample median, that the testing 
tolerances of the Wilcoxon signed rank test (Exercise 2 10 5) converge to 
the asymptotic estimation tolerance of the Hodges-Lehmann estimate, the 
median of the Walsh averages The convergence of t, to r is quite rapid, 
for example, some values of (n,T,) arc (4. 286), (6, 297), (10, 300), (20, 
298) 

We complete this section with a discussion of the sensitivity and influ- 
ence curves of the median of the Walsh averages The results of the 
following example can be compared to Example 1 6 4 

Example 2 4 2. We consider ^ - mcd(,tf, + Xy)/2, i < j, (2 3 2), the 
Hodges and Lehmann estimate of 0, denved from the Wilcoxon signed 
rank statistic We suppose that 4 is based on a sample of size n from 
ll(x) = F{x -9), F G To construct the influence curve we need to 
represent ^ as a functional evaluated at the empincal cdf, H,(x) This 
functional is implicitly defined setting the cdf of (AT, + Xj)/2 equal to 
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1 /2. To simplify the discussion, we consider 9* = ined(X, + Xj)/2, i,j = I, 

The cdf of {X, + Xj)/2 is given by 

<z)=J°° “h(x)h(u)dxdu 



u')h(u)du. 


Next define the functional T(H), the median of the distribution of (X, + 
Xj)/2, implicitly by 



H(2T(H) - u)h{u)du=- 1/2. 


(2.4.5) 


First we will show 6* = T{H^ and then we will derive the influence 
curve. Replace H by in (2.4.5) to get 


r H„{2T{H„)-u)dH„{u)=^\/2 
•/ — 00 


and 


ii^n(2r(i/„)-A,)=l/2. 

" (=1 

Now H„(2T(H„)- x,) = '2%iI(.Xj<2T(_H„')- x,)/n, where /(•) is the 
indicator function. Hence we have 

and this defines T{H„) as the median of the averages {x, + x)/2, 
i,j=\,2,...,n. ^ 

We now apply Definition 1.6.4 to find the influence curve. With ff, 
= (I - t)H + t8^ — H + l(S^ - H), we have, from (2.4.5), 

J^JH{2T(H,) -u) + t[S^{2T{H,) -u)- H{2T(H,) - «)]) 
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Expand this expression by multiplying out the integrand and integrator and 
then differentiate with respect to l Differentiation and integration can be 
interchanged by applying Theorem A18 since the integrand is bounded and 
differentiable The result of evaluating at r 0 is 

^ h(2r(/f )- J_“ /f(2r(W 

- J" H(2T(iI)- u)dH^u)+ j"* 3,(27'(/f) - u)dH(u) 

- J" H(2T(H) - u)d»(u) - 0 

Since Hlx) = f(jr - B), F e ft,, without loss of generality we let 0 = 0 
This means that in the foregoing equation we can lake T(/f)“ 0, just as we 
let the mean and median be 0 in Example 1 6 S 

Now I - F(u), /(-u )’* /(«) and so }F(~u)dF(u)‘‘ Xjl, 

/6/-u)ifF(u)-F(-u)-l-f(u), and I - 

F{yi The equation foregoing now becomes 

7< ’'<^■>1,..^/ ' '"OX - '/2 + (1 - m") - 1/2 - 0 

and 

2r(r)- I 

“ 2SF\u)du 


ny} - 1/2 

}/\u)du 


(24 6) 


Hence the influence curve is the cdf, centered and scaled This means that 
(and §) has a bounded, continuous influence curve and is robust Now 
from (2 4 6), since /lF(y)- 1/21^ df(y) = 1/12, we have <i^=/a^(y) 
dF{y)=‘ X/X2[ff\/)ifyy Hence (1 62) suggests n'^^(^-ff) IS asymptoti 
cally/i(0,ff*) 

The stylized sensitivity curve for ^ is a bounded, nondecrtasvng step 
function, much like an empirical cdf that has been centered and scaled For 
large n the sensitivity curve looks like (2 4 6), see the Pnnceton robustness 
study (Andrews ct al , 1972, Section 5^ 
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2.5. GENERAL ASYMPTOTIC THEORY FOR THE WILCOXON 
SIGNED RANK STATISTIC 

We now consider a random sample Xf, . . . , X„ from an arbitrary, continu- 
ous distribution H(x). Later we will want to consider the location model 
H(x) = Fix — ff). First, we develop the mean and variance of T and 
establish the consistency of the test based on T. Next, we point out that 
under alternative hypotheses, T can no longer be represented as the sum of 
independent random variables. This means that the Central Limit Theorem 
cannot be applied directly. In order to find the limiting distribution, it is 
necessary to project T onto the class of sums of independent random 
variables. The strategy is then to apply the Central Limit Theorem to the 
sum and show that the difference between T and its projection tends to 
zero in probability. After establishing the asymptotic distribution of T in 
general, we will consider the asymptotic power. Power considerations 
motivate the development of asymptotic efficiency which will then be 
developed m the next section. 

The mean and variance of T, in general, are determined by the following 
parameters which depend on the underlying distribution: 

p, = P{X,>0) 


P2=P{X, + X^>Q) 


p, = P(A, + X^>0,X^> 0) 


(2.5.1) 


p^ = P(A, + X^>0,Xy + X^> 0). 

In Exercise 2.10.8 it is pointed out that ^3 = (^2 d" p\)/'^\ hence only p^, 
Pi, and ^4 are needed. The development is more natural, however, when we 
include p^ 

Theorem 2.5.1. 


n(n-l) 
ET - np^ + 


Varr= n/i,(l p^{\ - P 2 ) 


(2.5.2) 


+ 2n(n - \)iP3-pjp2) + n{n - l)(/i - 2){p^-ply 
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Proof Define, in accordance with 'Iheorem 23 1, 


and hence whcf® the double summation 

I < j We then have 


(2 53) 
over subscripts 


«y 


b£T„ 


2 


ET 


12 


Now, ETti •= P{X, > 0) -/>, and £r,j •-PiX, + Xj> 0) 
Next, we have 

Varr-Varlssr. + Sr..) 


Consider these sums wntten out 

7'ii+7'ii+ +?'u+£tt+ +£i»+ 

+ r,,+ +r,.„+r„+ -c r„ ( 254 ) 

The variance of a sum is the sum of the variances plus twice the sum of 
the covanances There are three types of covanances we need to consider 
CoYfri, r,j), Covfrij, Tij), Covfru.rjJ From the independence of 
X„ , Ar„, we first note that 

Cov(r„.r„)-o 

The first type, Cov(r,„r,j), results from three matching subsenpts 
There are n ways to choose Tj^ (represented by T, ,) Then 2}* or appears 
n ~j times in the ylh block of (254) and j — 1 times, once in each block 
before the yth, for a total of n-j+j-l^n-l Hence there are 
n{n - 1) covanances of the type Cov(r,|,r,2) 

The second type, Cov(7'ji,7',jX results from two matching subscripts 
There are n(n - l)/2 ways to choose In the yth block of (2 5 4) there 
are n - J - 1 and there are one each of T,, m the preceding j - I 
blocks for a total of (n -j ~ l) + (j - ]) ^ n- 2 Hence, there are 
n(n - IX" - 2)/2 covanances of the ^pe Cov{T„, T,,) 
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We can thus write the variance of T as 


Varr= nVarr,, + 


n{n-\) 


Varr,2 


+ 2 


n{n - l)Cov(r,„7’,2) + 


n{n - 1)(« - 2) 
2 


cov(r,2,r,3) 


To complete the argument, note that Varr, 2 - 

pl and Cov(r,„r,2)=£7’u7’i2-^7’ti^^i2 = /’3-/'i/^2. and similarly 
VarT,, = ~ pj and Cov(T^ 2 ,T^ 2 ) = P 4 - pi- 


Example 2.5.1, Define the class of absolutely continuous distribution 
functions H G if H^x) + Hi- x)< I for all x with strict inequal- 
ity for some interval of x values. We refer to Qp as the class of stochasti- 
cally positive distributions since, for x> 0 , PiX > x) = \ — Hix) > 
Hi- x) - PiX < —x), and consider the consistency of the Wilcoxon 
signed rank test on Let Xf, . . . , X„ be a random sample from a 
distribution, G(x), and let T = 7’/n(n -1- 1), then from (2.5.2): 


Varcf-»0. 

Hence T satisfies (1.3.2) with p(G) = p^/l- Now, 


(2.5.5) 


2p(G)=;72='P(^i + ^2>0)= f'” r°° 

J ^ 00 ^ — X2 

= f” [1 - G(-X2)]g(X2)rfX2. 

•'—00 

If GEfl, then G(x)=l-G(-x) and P 2 = i-^Gix')gix)dx = \/l. If. 
G G^p then G(x) < 1 - G( — x) for some interval and p^ > 1/2. From 
Section 2.2., we have the required ^ymptotic normality, and hence we can 
apply Theorem 1.3.1 to assert that T is consistent for stochastically positive 
alternatives. 

Note the location model, given by G(x)=F(x — 0), F gQ,^, is an 
example of a stochastically positive distribution when 0 > 0. Hence the 
Wilcoxon signed rank test is consistent for symmetric shift alternatives. 

If the treatment acts to alter the symmetry of the control population and 
leaves the median at 0, a stochastically positive distribution may result. 
Then T will still tend to fall in the critical region and reject Hq:9 = 0. 
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Hence It IS important to specify carefully the hypotheses under test The 
Wilcoxon signed rank test could ju$l as well be considered a test for 
symmetry if the location is fixed Sec Fig 2 1 for an example 

We now turn to the asymptotic distribution of T Under //„ P«0, 
FeO,, we saw m Theorem 22^ that T is a linear combination of 
independent random variables, and a fonn of the Central Limit Theorem 
could be applied to establish the limiting normality. In the following simple 
example, we show that this independence breaks down under the alterna- 
tive hypothesis 

Example 2.5.2. Suppose we have a sample ATj.X} from a uniform distribu- 
tion on (—1,2) Then the sample comes from F(jc ~ F G{1,,6 >0 
From the definition of IF, and IKj, (22 1), we have IV, = 0 and Ifj *“ * 
if and only il-l<y,<0<-y, <yj<2 where Yi and yj denote the 
order statistics Hence 

r(K', - 0, . I) - 2’ - 1 

Likewise f(fy, -0. l^j • 0)- 1/9, Ffir,- 1,11'j-O)- 1/9 and F(lf', 
■ I.ffj" l)»4/9 TTie marginal probabilities are F( IF, •» 0) - 4/9, 
F(1I', - l)-5/9. F(IF,»0)»2/9. FfIFj - l)-7/9. and so. for exam- 
pie, 

F(»'i » 0)r(tl'j- 1)^ P(W, - 0. IK, - 1) 

In order to deal with T m general, we resort to the method of projection 
Hajek (1968) has an excellent discussion of the technique Essentially, the 
statistic T is projected, via conditional expectations given the A'/s. onto the 
class of sums of independent random vanables The asymptotic normality 
of the projection follows from the Centra) Umit Theorem, and the asymp- 
totic normality of T follows when the difference between the projection and 
T IS small m probability Hence we find the Central Limit Theorem 
remains the mam tool for determining the asymptotic normality We 
proceed by mtroduemg, more formally, the idea of a projection and then 
the idea of smallness in probability 

TTieorein 2.5.2 (Projection) Suppose AT,, , AC, is a random sample from 
an arbitraiy distnbulion, /f(x) Let K= F(Af„ ,X^) be a random 
vanable such that £K=0 If fK* 2:_,/i,(A’,X then £(K- Wf is mini- 
mized by choosing the function p,{x} as 

A*(*) = r(FlAr.-x) (2 5 6) 
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The random variable “ called the projection of V and 

E{V-Vpf= VsLt V - Var . (2.5.7) 


Proof. By adding and subtracting Vp we have 

E{V- Wf--E(V- Vpf+E{Vp - Wf+1E{V- Vp){Vp - W). 
The cross product term is 

E(V-V,XI',-IV) 

( = 1 

-^^^{[p.*(^.)-p.(^.)My- (2-5-8) 

and 


E[F-FplX,]=:E 


v-pr(x,) 


j*< 


But from the definition, (2.5.6), 

£[F-/;r(2r.)!2f,]=0. 


Further, 


E[prmm^Epf{X^) 

==EE{V\Xj) 
= £F=0. 

Hence (2.5.8) is 0, and 


£(F- Wf=E{V- Vpf+E{Vp- Wf 

which is minimized by choosing W= V . 

If we choose W = 0, then (2.5.7) also follows and the proof is complete. 
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It should be noted that the same proof works if are indepen' 

dent but not identically distnbuted See Ifajek and Sidak (1967, p 59) for 
an example in which the random variables are dependent In general the 
p*i ) will be different for different i In most of the examples in this book, 
however, they are identical because the X,’t are identically distributed 

Example 2£3 The projection method is illustrated by considering the 
sample variance for a sample of size n from a distribution with mean n and 
vanance The major application »s given in Example 2 5 5 

(259) 

(2) 

Since ES* ■■ 0*, define I' ■ S* - o* Then 

ifi-yorA 
otherwise 


- 2|ur + 
2 


i{i=jorIc (2 510) 
otherwise 



p.*(x)»£(riAr.«x)=.£ 2 2 

(( 2 ) 2-.. -2.. 

(n - 1) {x^ - 2fw + js* - o*) 


i- V V 


The n - 1 factor occurs because if y = 1, then i=j<k generates n - 1 
terms, and if k «» 1, then j<k= 1 generates 1 - 1 terms for a total of 
H-i + i- 1 * n- I 
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Hence the projection of the centered sample variance is 


n — 

1 " 1 

1 V 

f X? - 2iiX, + n^-a^' 

(2) 

1 '■=• 

2 


« /= 1 L -I 

which is an average of independent, identically distributed random vari- 
ables. Provided EX^ < oo, the Central Limit Theorem implies that is 

asymptotically normally distributed. 

The strength of the projection method lies in showing next that 
„ 1 / 2 ( 5 2 _ p 2 ^ have the same limiting distribution. We will need 

the following result. 


Theorem 2.5J3. Suppose W„ has a limiting n(0,o^) distribution and 
^0. Then U„ has the same n(0,a^) limiting distribution 

aslP„. 

Proof. Define R„ = U„ - JV„, then in accordance with Theorem A3(a), we 
can write C/„ = + R„. Note that by Chebyshev’s inequality, Theorem 

A4, 


> €) < 



£((/„- JV„f 


^ 0 . 


Hence R„ converges to 0 in probability, and the theorem follows from 
Theorem A3(a) with c = 0. 


In applications, fV„ is the projection of [/„ so that by Theorem 2.5.2, 
£■(£/„- IP„)^ = VarI7„ - VarIP'„, and it is necessary then to show this 
difference tends to zero. 


Example 2.5.4. We complete the previous example now by showing that 
n'/ 2(52 _ ^ 2 ^ jg approximately normally distributed. 

We define fl^^ = E(X - p)'' and note from (2.5.12) that 

Varfn Vp = n^- 

From Cramer (1946, p. 348) 


Vaxfii - [ II — I 



(2.5.13) 
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Because, from (2 5 7), - Var/i'/^5* - Varn'^'V^, 

which lends lo lero, it follows ftoio Theoeetn 25 3 that n'^^(S* — c^) has 
the same limiting distnbution as namely, /i(0, — a*) 

In the next example we apply these theorems to determine the asymp- 
totic distnbution of T The mam result is stated at the end of the example 
in Theorem 2 5 5 The strategy involves four steps find the projcctit of V, 
say K, argue the asymptotic normality of find the Vaf V and \ V , 
and snow Var V ~ Var 0 

Example 233. Let X„ ,X^ be A random sample from an arbitrary 
dvslnbuiion with absolutely conlmuous cdf H(x) We continue lo exploit 
the counting form for T, defined in (253) and wnte 

(2 5 14) 

,<J k 

From (2 5 2) define F « T - £r. hence 






Pi 


”P) 


^22(T,-Pi)*'2(r,,-pr) 

KJ * 


(2515) 


I We first show that the projection is given by 

The second term on the nght side of (2 5 15) is just the centered sign 
statistic Since it is already a sum of independent, identically distnbuted 
random variables, it is its own projection We consider then the first term 
on the right side of (2 5 15) 

From the definition (2 53) 

t-H(-x)-,, 

0 

Hence 


if k = tOTJ 

otherwise 
if it ■» » ory 
otherwise 


(2 5 17) 


=xj = (/i- 1)[1 - 
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since there are only n — 1 nonzero terms in the sum. When i = k theny can 
assume the n — /c values greater than i = k and when j = k then / can 
assume A: — 1 values less than k for a total oln — k + k— \ = n — \ values. 
Now (2.5.16) follows from Theorem 2.5.2. 

We next show that is asymptotically distributed as /i(0, p^— p^- 

2. From (2.5.16) we are led to consider 

^ S [1 - -ft] + Sen. -/>.) 

P-5>8) 

We first consider the term n~^^^Vp = — — p^. Let 7, 

= 1 - then 


/* 00 

£7,= ) {\- H(-x))h{x)dx 

J— 00 

= /“ \'^h{y)h{x)dydx 

— 00*^ — X 

^P{X,>-X:,)=p, 

given in (2.5.1), where are i.i.d. H{x). Furthermore 

£7,^=J^(1 — H^ — x)Yh{^x)dx< 1 < 00. 

Hence the Central Limit Theorem implies that n~'^^Vp is asymptotically 
normally distributed with mean 0 and variance given by 

Var(l - H{~X)) = £(1 - i/(-X))^- p| 

{\-H{-X)fhix)dx-pl 

•'—00 


_j {'^h{y)h{z)h{x)dydzdx-pl 

— 00 •' — X*' — X 

= P(X, + ^2 > 0,X, + ^3 > 0) - pi 

=P4-Pl- 
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Now write (2 5 18) as 


1 

„3/2 


1 y‘ _ - 1 - y + — ^ 
n'/i ^ „J/2 ' „Vi 




The second two terms are bmes the averages of iid random 

variables Since Y, — and T),, - />| both have means 0 and variances 
bounded above by 1, Chebyshev's inequality implies that they tend to zero 
in probability Hence Theorem 253 implies that is asymptoti- 

cally normally distributed with mean 0 and variance p\ In fact, 

3 In order to show *• - ET) has the same asymptotic 

distnbution as we consider (by Theorem 2 5 2) 

E(n -^VarF- JjVarl', 

Now, by (2 5 2). 

■^Vary-tp.-pl. 


and since Varn'*^’l^^ has already been shown in part 2 to converge to 
P 4 ~pI> Theorem 253 implies that ET) is asymptotically nor- 

mally distributed with mean 0 and vanance p^ - pi 


We summarize the result m the following theorem 


Theorem 2,5 4. If //(x) is an arbitrary, continuous distribution function 
such that 0 < H(.0) < I, then (T - £r)/(VaT is asymptotically n(0. 1) 
with Var T given by (2 5 2) 

Proof Note that £4 - /»] “ Var(l - H( - X)) > 0 This vanance is 0 only 
if the distribution of —X is constant on the support of X As long as 
0 < fI(Q) < 1 this cannot happen, hence /U ~ £2 ^ 0 
Hence we have 

r- ET 

IS asymptotically nfO, 1) and the theorem follows since 
Var r , j 
"’(Pa-Pf) 
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In the location model, H{x) = F{x — 6), F gQ,^. Hence, for d>0, 
H(0) = F{-9), and if 0 < F{0) < 1, Theorem 2.5.4 implies that (T - ET) 
/(VarT)'/^ has a standard normal limiting distribution with ET and VarT 
given by (2.5.2). 

That we need 0 in the support of X is made intuitively reasonable by 
thinking of a distribution K[x) such that the support has been shifted to the 
positive axis. In this case T = n{n + l)/2 with probability 1, and this 
degenerate random variable has variance 0. Thus, the condition is needed 
to insure that T has a nondegenerate distribution. 

Exercise 2.10.35 describes another application of the Projection Theo- 
rem. In general, the Projection Theorem provides the basis for the asymp- 
totic distribution theory of a large class of statistics called U statistics. 
These statistics can be expressed as symmetric functions of the observa- 
tions, and many estimates and test statistics can be written as U statistics. 
For further reading see Randles and Wolfe (1979, Chapter 3), Puri and Sen 
(1971, Chapter 3) or Lehmann (1975, Section 5 of the Appendix). 

We are now in a position to make a tentative comparison of the sign test 
and the Wilcoxon signed rank test. The fact that T is not distribution free 
under alternative hypotheses (Example 2.5.2) makes it extremely difficult to 
compute exact probabilities of T in general. We discuss these probabilities 
in more detail in the next chapter. In the meantime. Theorem 2.5.4 provides 
the basis for approximate power calculations. The following paragraph will 
show that even the asymptotic approach leads to computational difficulties 
in the location model. These considerations motivate giving up power as a 
means of comparing tests in favor of asymptotic efficiency. 

Recall that we let A'j, . . . , be a random sample from F{x — 9), 
F e . Since this is to be an asymptotic comparison of S and T, we put T 
into a form that eliminates the higher-order terms from its mean and 
variance. 

As in Example 2.5.1, let 


r = 


1 

n{n+ 1) 


T. 


Then, from (2.5.2), we have 


ET->^ 
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and, from Theorem 2 5 4, 

fHf-p,/2) 

yir.-rf) 


(25 19) 


IS approximately «(0, 1) Hence, for large «, we need to compute p 2 and p, 
in order to approximate the power _ 

For S, the sign statistic, from (I^ 3) and defining S’ •= S//i, we have 


^p,0 -Pt) 


(2 520) 


IS approximately n(0, 1) So, tn addition (o pj and p^, we need to com- 
pute Pf 

In Exercise 2 109 you are asked 10 compute p\,pi,pt for the uniform 
distribution on (-1.2), and hence find the approximate power for the two 
tests If this IS done for vanous sample sizes, we see two things emerge On 
the one hand 7 appears to be more powerful than S, and on the other, both 
have power approaching I as the sample sizes become large This last point 
reflects the consistency of the tests 

This sort of asymptotic power comparison is not very satisfying We 
need to assume large sample sizes in order to neglect the higher-order terms 
in the means and variances and have an accurate normal approximation, 
but then the power is generally close to one for these sample sizes The 
parameter p^ is especially troublesome to compute and may be impossible 
for many choices of F The following example illustrates one more case 
where it can, with some effort, be calculated 

,X„ IS a random sample from n{9,o^) 





x,*x,- 2s -ffe 
« ■ ^ - 


Example 2,56 Suppose X^, 
Then 
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Now a straightforward calculation shows that E{X , + X^{X , + X;^ 
= 0 ^ + 45^ and p=^\ /I. Hence can be found in tables of the bivariate 
normal distribution with means 0 , variances 1 , and correlation 1 / 2 . 

As an example we take « = 10, a =.0527 so the critical region is 
determined by 7’ > 44. In this case the power at 0 = a is approximately 
4>(1.475) = .93 or $(1,586) = .94, corrected for continuity. The exact value, 
calculated by Klotz (1963), is .89. Hence the power is overestimated when 
we neglect the higher-order terms. For these calculations /?, = .8413, p 2 = 
.9213, and 774 = .8657, and when the full expressions for the mean and 
variance of T are used with the normal approximation, it is seen that the 
power is approximated by .90. 

The approach of the power to 1 can be partly compensated for by 
choosing the alternative close to the null hypothesis. However, for any fixed 
alternative, this will ultimately fail for a consistent test. In the next 
calculation, we see that, for alternatives suitably close to the null hypothe- 
sis, it will no longer be necessaiy to compute the troublesome p^. This sort 
of approximation will then motivate the introduction of asymptotic effi- 
ciency which is a local (i.e., close to the null hypothesis) measure of power. 


Example 2.5.7. Let X i, . . . , X„ he a. sample from F(x — 0), F £ 0^. For 
testing Hq: 9 = 0 versus H^:6>0 with T, the approximate critical value c 
is given by (2.2.9). Hence, for a fixed alternative, the power is approximated 
by 


with ET and VarT given by (2.5.2). Now, we write 

u(n + 1 ) /i(« — 1 ) /j(n — 1 ) / 1 \ 

4 "Fi 2 ^ 2 ( 2 


(2.5.21) 




)■ 


and hence 




+ l)(2/j + 1) /i(n — 1) 


24 





(2.5.22) 

We now make a further approximation of p^ and p^. We expand these 
functions of 9 about 0 = 0 , retain the linear terms, and neglect the higher- 



ONE-SXMPLE MODEL WITH SVMMETWC, CONTINUOUS DISTRIBUTION 


order terms This approximation is valid for small values of 0 Recall 

1 - F{0) + 0/(0) 

"i+W 

and ^ -/»i“-S/(0) Likewise, 

P(X^* Xi>0) 

- 1 - f (-2^) 


+ 2er(0). 


where F* is the convolution distnbution. the cdl of X, + Xj. where X|, Xi 
ate n d f (*) and \/l- Pi^- 2fl/*(0) Furthermote. by continuity if 9 is 
close to 0 


Varr- 


n(n+ l)(2n+ 1) 
24 


Its value under //o ^■■0 Substitution in (2S 22) and (2 5 21) yields 


P(T>c)^ 


H(>i-l)g/*(0)-t-ng/(0) 

7 n(n+ l)(2fi+ 1) 

24 


(2523) 


Note that />4 IS not necessary for the calculation of (2 5,23) We next 
consider in more detail the convolution density f* If we suppose sufficient 
regularity to pass derivatives through the integrals, then we have 


and 
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r(0)= 

./— oo 

If F is a n(0,a^) distnbution, then it is easy to see that 


(2.5.24) 


/( 0 ) = 


1 


o 

1 

2^ a 


and 


P{T > c) = l - ^ 


— r= 


n 

2 ^ 


+ l)(2n + 1) 


24 


(2.5.25) 


From Example 2.5.6, n = 10, a = .0527, = 1.645, and 0 = 0 yield a value 

of .91 for (2.5.25). 

The approximation (2.5.23) is valid for values of 9 close to 0. Note that if 
6 > 0 is fixed as n tends to infinity, 

P{T > c)->l -4>(-oo)= 1. 

Hence if we wish to stabilize the power away from 1, we need to let 6 
approach 0 as n increases. For large n, using (2.5.24), (2.5.23) can be 
written 


P(J> c) = \- 4>(Z„ - /i2 ( / f\x)dxy^ j (2.5.26) 

and hence, if we take the sequence 0„ = a f 

1 - -Vl2 ( J f\x)dx)a') (2.5.27) 

can be considered the asymptotic local power of the Wilcoxon signed rank 
test. 

In Exercise 2.10.10, you are asked to find a similar expression for the 
sign test. Then it is possible to make a local, asymptotic power comparison 
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Figure 2^ Subil;AA| the power elong • lequence of »liens«tivet. 


of T and S for various underlying distnbutions In that exercise, you will 
see that for an underlying normal population, T « more powerful than S, 
but the reverse is true for an underlying double exponential (Laplace) 
distribution We can thus conclude that even though the sign test appears 
to utilize very little information in the sample, it can be quite powerful even 
for symmetne populations 

The preceding approximations provide a heunstic analysis in terms of 
local power It is clear (hat to stabilize (he power away from one, it is 
necessary to consider the power along a sequence of allcmatives {f>„) The 
sequence suggested by the examples is fl* = o/n'^* which approaches zero 
at the rate of The next section makes these ideas more formal and 

replaces the idea of local, asymptotic power by asymptotic relative effi 
ciency See Fig 2 4 

2 6 ASYMPTOTIC RELATIVE EFFICIENCY 

The following development of asymptotic efficiency is due to Pitman and 
was presented m a summer course that he taught at Columbia in 1949 
Noether (1955) gives the first published version We begin with a definition 
of efficiency in the case of finite sample sizes 

Definition 2 61. Let i«=I,2. be size a tests of //q ^ = 0 versus 
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>0 based on Xi, . . . , X„, a random sample from F(x - 9), F fixed 

in ^ 0 * 

For given 9 and p, a <)S < 1. let n^‘\ i = 1,2, be the sample size 
required for 


Then the efficiency of relative to is 

Hence, if = .5, then requires about twice as many observa- 

tions as to obtain the same power. Note that we would need the 
distribution of F„^'^, i = 1,2, under the alternative hypothesis in order to 
compute the efficiency. If we only consider large sample sizes to avoid this 
problem, then we again have the problem of comparing two consistent 
tests, both with approximate power equal to one. Recall the discussion of 
Example 2.5.7 with Fig. 2.4. We have the following definition of asymptotic 
relative efficiency. 

Definition 2.6.2. Let vy\ / = 1,2, provide two tests for //q : 0 = 0 versus 
9 >0 based on A',, . . . , a random sample from F(x — 9), F fixed 
in Aq. Suppose the tests are asymptotically size a, that is. 

For fixed p, a < ft < I, suppose (^} is a sequence of alternatives such that 
with the corresponding sequences i = 1,2, of sample sizes such 

that 


i=I,2. 

Here we have suppressed the subscript nj^ on V^'K Now at 9j, is 

the efficiency of F^'^ relative to and when the limit 


nf> 

. lim — 


J'-^OO 


( 2 . 6 . 1 ) 


exists and is independent of a, and P, it is called the asymptotic 
relative efficiency of F^ relative to F^^^ . 

As we shall see, this asymptotic efficiency is easy to calculate and the 
limit is often approached quite rapidly. This means that e ,2 gives a good 
indication of the ratio of sample sizes necessary to attain the same level and 
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power for alternatives close to the null hypothesis Note also this efficiency, 
as expected, does depend on the F fixed in Aq 


Example 2 61. In this example we give still another heunstic comparison 
of T and S Under the same assumptions and notation of Example 2 57, 
we see from (2 5 26) that the Wilcoxon signed rank test has approximate 
power given by 


If a < /9 < 1, IS fixed, then the requirement that the power be approxi 
raately ! - 4>(Zj) •» p means 

Zf-z,-lnf/\x)dxsJi; 

Solving for n we have 


e‘\2(!/‘{x)dx)‘ 


( 262 ) 


II the sequence of alternatives is 1$^), then can be defined by (262) 
Likewise, from Exercise 2 10 1 1 for the sign test, we have 




0^4/\O) 


(263) 


Hence, for the same o, fl, and (^), the ratio of sample sizes 



yields the asymptotic efficiency If / is the sUndard normal density we 
have e ,2 = 2/3 Hence, when sampling from a normal distnbution with 
alternatives close to the null hypothesis, the Wilcoxon signed rank test 
requires about 2/3 as many observations as the sign lest to attain the same 
level and power 

We now state a set of regularity conditions, due to Pitman that make 
the calculation of efficiency quite easy in many cases We suppose that V„ 
IS a statistic tor testing Ho 6 * 0 versus H. 0 > 0 with cntical region 

y^> k„ 
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1. V„ provides a consistent test. 

2. There exist sequences { and (u„(^)} such that (V„ — Mn(®))/ 
o„(0) is asymptotically n(0, 1), uniformly in a neighborhood of 0 = 0. 

3. 


4. For a sequence {0„} such that as oo 


<^r(0) 


-> 1 and 


m 


1 . 


as n->oo. 


5. 


^fna„(P) 


> 0 . 


Definition 2.63. The quantity c is called the efficacy of the test based on 
V„. For large n, it measures the rate of change in standard units of the 
“asymptotic” mean of F„ at the null hypothesis. A test with relatively large 
efficacy c is responding rapidly to alternatives and might be expected to 
have good local power properties. 

The conditions 3 and 4 are smoothness conditions on the parameters of 

P 

V„ as functions of 6. Often, the statistic is constructed so that /i(0) or 
fi(0) and n Var -> a\0) so that we can take n„(0) = /m(0) for all 
n and = a{0)/n'^^. Then condition 2 becomes: — n(0))/a(0) 

is asymptotically «(0, 1), uniformly in 0 near 0; and the efficacy is easily 
computed from the asymptotic parameters fi{0) and a{0) as c = ii'(0)/a(0). 

The conditions 1 and 2 are asymptotic distribution conditions. The. 
uniform convergence to normality, condition 2, in a neighborhood of the 
null hypothesis requires slightly stronger central limit theorems. An impor- 
tant tool for establishing the uniform convergence is the Berry-Esseen 
Theorem A14 and is illustrated on the sign test. 

Example 2.6.2. Let Xi, . . . , X„ be a random sample from F(x — 0), 
F E Aq, with /(O) < CO. For testing Hq:0 = O versus : 0 > 0, -we use the 
sign test defined in (1.2.1) as S=^^s{X,) with pi0) = PiX > 0)= I - 
P{~0) defined in (1.2.2). 
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In the notation of the Pitman conditions, we let 
“ -p(6)) The smoothness conditions arc then just smoothness 

conditions on the underlying distnbution function We apply the Berry- 
Esseen Theorem A 14 to jfAT,), . . , sfATJ. then 

-p(«),(«)[p>(S) + ?’(»)] 

Hence we see, for a constant K, 


<P(»)9(«))" 




(265) 


ntighbothood ot 6 . 0. lhal ihe vanance »’(() 
- piO)q(B) > Af >0. that is, it is bounded away from 0 Then 


uniformly m 6 in the neighborhood Hence the sign test converges uni- 
formly to normality 

condition that p{6)q{9) > M > 0 in a neighborhood is not unrea- 
tipllfr * p'l ** 0-/'(0)?(0)“ 1/2, and we have a continuous distnbu- 
• 1 , A ** lypical applications we will need an upper bound on the 

third absolute moment and a positive lower bound on the second moment 
bee Theorem A 15 m the Appendix When ihis a the case, we will be able to 
show he projection (^eorem 2 5 2) cocetBes lo romiality uni/oraily, and 
RnimH ™ ° Slutsky s Theorem A3 will complete the argument 

Bounding the moments front above is generally not a difficulty lor rank 
probabilities, which are already 
bounded above by 1 (See Theorem 2 5 1 for an example ) Hence essentially 
Ihe only new point to be considered is bounding the vanance away from 0, 
which occurs when the asymptotic vanance is continuous in « neat 0 and 
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positive at ^ = 0. We now proceed with the development of asymptotic 
relative efficiency. 

Theorem 2.6.1. Suppose F„ satisfies the regularity conditions 1 through 5 
and provides an asymptotically size a test of Hq:0 = 0 versus H^:6 >0. 
Let 0„ - for 0 >0, fixed. Then the asymptotic power is given by 

lim (V„>k„)=l-^Z,- 0c) (2.6.6) 

where 

Jim Po( > A:„) = 1 - 4>(Z„ ) = a, (2.6.7) 

and c is the efficacy defined in condition 5. 

Proof. Using the regularity conditions and Theorem Al, we have, for 
e > 0 and for n sufficiently large, 


P,(K>k„)-l 



M«(®) \ 

I 


< e, 


( 2 . 6 . 8 ) 


for all 0 m a neighborhood of 0. Next we expand fi„(0) about 0 to get 

(i„(0)^lx„(O) + 0,i'(0*), 

where 0 < 0* < 0. Hence the argument of <!> becomes 

k„-fi„(0) k„-fi„(0) 0i^{0*) 

<^n{S) O„{0) O„{0) ■ 

By the uniform convergence condition 2, for sufficiently large n, 0„ 
= 0 / will be in the neighborhood of 0 and can be inserted into (2.6.9). 
Multiplying and dividing the right side of (2.6.9) by ( 7 „( 0 ) and replacing 0 
by yields 


o„(0) \ 

0 

1 <^«(0) \ 

^n(0) Un(0„)/ 




where 0 < 0* <0„. We now have, from (2.6.7), that 


o 

1 

<^n(0) \ 

‘^n(O) \ 
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VJT 


«.{0) 



Hence (2 6 6) follows upon replaang B by m (2 6 8) 


Thus we can formally approximate the power of a consistent test near 
the null hypothesis This was the point of the discussion of Example 2 57 
The power approximation in (2 6 6) requires the asymptotic variance under 
Hq and the asymptotic mean under //^ to compute the efficacy c The 
results of Theorem 2 6 1 can be expressed in a different form The limit, 
(2 6 7) can be lewntten, with fc, •• m,(0) + io,(0). as 


/ K-kW 

’I ■’.(0) 


<1 -*4.(0 


t>g 

or [P, - M,(0))/o,(0)-»Z— /i(0. 1). where denotes convergence m dis- 
tnbution whend-O Likewise <2 6 6) becomes, withtf, - 


I <’.(0) 



-Be) 


or {f', - p„(0)l/tf,{0)-*2— /i(^c, I) Hence the asymptotic distribution of 
[K. " K(0 )]/o,( 0) changes only m the mean, from 0 to 9e, when the 
alternative is allowed to converge to 0 We now develop the asymptotic 
efficiency of two tests 

Theorem 262. Let vy\ » *• 1,2, provide two tests of Hq 9 = 0 versus 
9 >0 Suppose they both satisfy the regularity conditions 1-5 and 
further satisfy the conditions of Definition 2 6 2 Then the asymptotic 
relative efficiency of f'y’ relative to is 


(26 10 ) 


where c, is the efficacy of < = 1,2 

Proof For the moment we will suppress the supersenpt and consider F, 
The power converges according to Definition 2 6 2, and hence the standard- 
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o„{0„) 


constant. 


Multiply and divide by to get 






-> constant 


and 


•(nd„ = constant + o(l). (2.6.11) 

If we let 0 denote this constant, then we see that the convergence of the 
power implies the form + o(l)/n''^^. 

Now from Theorem 2.6.1, we have, for i— 1,2, 

Since these powers must be the same, 

0(’V, = 0<2)c2. 

Furthermore, from (2.6.11), since the sequence must be the same (see 
Definition 2.6.2), 



which can be rewritten as 


vir ('+<■('))= IS +■’(■) 


= 7 ^ +«(!)• 

Now c|, which completes the proof. 
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Example 2 62 Let X^, , AT, be a random sample from F{x - 9), 

Feil, The sign sUlistic S, {I 2 1) has efficacy 

f.-2/(0) (2 612) 

This follows immediately from Definition 26 3 with ■ £S n(l - 
f(- »)). MO) - n/(0) and o.=(0) - n/4 

The Wilcoxon signed rank statistic T, (2 2 2), has efficacy 

Cr-/i2 ]■/'(»)* 


To see this, first note from (2 5 2), expressing p, and /ij m terms of F, we 
have 


n{n —1)/- 

- n(\ - F(-«)) - F(-F - ie))f(y)'iy H) 


{fence 


rt(0)-n/(0) + fl(/i-l)J/'(«)* 

Now using 0^(0) given in (2 2 4) and Definition 2 6 3 the equation for 
follows The differentiation under the integral in (2 614) is justified by 
Theorem AI7 in the Appendix We only need to suppose that F is 
absolutely continuous and //^(*)d!*<<» The uniform convergence to 
normality condition 2, i$ discussed in Exerase 2 10 12 

The I statistic, i = n'^^X/S where X. S are the sample mean and 
standard deviation, respectively The efficacy is 



where Of ^ jxy^(x)dx is the variance of F This follows from /i„(tf) 
= n'^^/ay and 0 „(O)= 1 and appljnng the definition of efficacy 
We are now ready to fist the Pitman efficiency equations for the vanous 
tests Recall the efficiency of idalive to is c(T^'\T^^')= i\h\ 
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e{S,T) = 


3[ff\x)dxf 


e(S,t) = 4aJf\0) 
e(T,i)=l2cf(^jf{x)dx)\ 


(2.6.16) 

(2.6.17) 

(2.6.18) 


It is pointed out in Exercise 2.10.13 that these efficiencies are scale 
invariant. Hence the value of the efficiency is independent of the value of 
aj, the variance of the underlying distribution. 

Example 2.6.4. Let X^, . . . , be a random sample from Fj(x) = (1 — 
£)5 >(x) + e$(x/3). This corresponds to sampling from a distribution that is 
a mixture of n(0, 1) and /2(0,9). For small values of e, most of the 
observations come from n(0, 1) with occasional large observations from 
n(0,9). This model is called the contaminated normal model. It was first 
treated in detail by Tukey (1960) and provides a model that is very difficult 
to distinguish from the normal model for small to moderate values of e. 
Furthermore, contamination may have its greatest impact for small values 
of €. For example, Tukey points out that the two distributions contribute 
equal amounts to the variance of /^(x) when e = .10. 

We will compute the efficiency of T relative to t for this model. We have 

/,(X) = (1-£)4,(X) + Ci^(|) 
where ^{x) is the «(0, 1) density function. Hence, 

r r-) ~ e(l — e) 

2iiT 6vw v5 vw 

orj = 1 + 8c 


c(r,/) = 


3(1 + 8e) 


TT 


2^ _ 2£(1-c) 


(1 - C)^ + V + 

^ ^ 3 ^ 


(2.6.19) 
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Table 2J Efnclency of T Relative to r 

< 0 oj JO Is 

095S 1009 1 108 1 196 1301 1373 1497 


Table 2 3 illustrates (2 6 19) Note that when < ■» 0, we are simply sampling 
from a normal distribution and the efficiency of T relative to / is 0 955 
This means that the loss of efficiency due to using the rank test T rather 
than the optimal t test is negligible Furthermore, for mild contammation, 
such as the rank test T can be substantially more efficient (20%) than 
the / test In fact, this example shows that the / test loses its optimality quite 
rapidly as we move from the normal model into a neighborhood of (he 
normal model 

In Exercise 2 10 14, you are asked to compute the efficiencies e(r,t)fc»r 
various models The implications of these results are that T is highly 
efficient for a broad selection of models When the tails of the underlying 
distribution are sufficiently heavy, say close to a double exponential or 
Laplace distribution, 5 is more efhciem than either r or T In the examples 
considered, m which the underlying distribution has uits heavier than a 
normal distribution, / is never very efficient 

The examples and exercises compare the rank test T to the t test for 
several specific underlying distributions and two fairly broad families of 
models The question still remains whether there are underlying disinbu* 
tions for which the i lest is highly superior to T The answer is m the 
negative as the next theorem due to Hodges and Lehmann (1956) shows 

Tbeorem 163 Let X,, , X, be a random sample Irom f (x — 0), 

Fen, Then 


infc(T,r)«0864 ( 2620 ) 

This theorem shows that no matter what the underlying distribution, the 
efficiency of T iclative to i js never less than 0 864 We have already seen in 
Exercise 2 10 14 that c(7',r) can be made arbitrarily large 
Proof From (2 6 18), e(T,i)= \2a}(ff\x)dxf If <r?=M then clearly 
e(T, 0 > 0 864, hence we can restrict attention to F e d, such that af<<x> 
Furthermore, by Exercise 2 10 13, e(T,t} is scale invariant, so without loss 
of generality we will only consider F e fl, such that o/ « 1 In the following 
argument we suppress the dummy variable of integration 
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The problem, then, is to minimize //^ subject to jf=fx^f= I and 
jxf= 0. This is equivalent to minimizing 

I f+ 2b j xi- 2ba^j f (2.6.21) 

where a,b are positive constants to be determined later. We now write 
(2.6.21) as 

]■ [f + 2b(x^ - «V] "Xk 

+ f [f + 26 (*"-«V]- 

First complete the square on the first term on the right side of (2.6.22) to 
get 


f, f , b\x^-a^)\ (2.6.23) 

Now (2.6.22) is equal to the two terms of (2.6.23) plus the second term on 
the right side of (2.6.22). We can now write down the density / that 
minimizes (2.6.21). 

If |a:| > a takcf(x) = 0, since x^ > oX and if |x| < a takef(x) = b(a^ — 
x^), since the integral in the first term of (2.6.23) is nonnegatiVe. 

We now determine the values of a and b from the side conditions. From 
// = 1 we have 


b(a^ — x^)dx= 1 


which implies that a?b = Further, from ^x^f— 1 we have 

x^b(a^ — x^)dx= 1 
J — a 

from which a^b = ^. Hence solving for a and b yields a = 5’/^ and 
b = [3(5)'/2]/ioo. Now, 


fM! 




100 


.2 


(5 - x2) 




3V5 
25 ’ 
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and lo complete the proof of (2 6 20), note that 

In summary, we see from the examples, exercises, and the last theorem 
that 

e{T,l) “0955 for underlying normal model 
“ I 50 for underlying Laplace model 
> 0 864 for all underlying symmetnc models 

This would seem to suggest the pnee of an optimal test at the normal model 
IS too high to pay We would be belter off using the suboptimal rank test 
which IS more stable over the whole class of symmetnc models 

The two major cnticisrns of this type of efficiency are that it provides an 
asymptotic companson valid only for very large samples, and it is a local 
companson, valid only in a neighborhood of the null hypothesis The latter 
simply points to the fact that a single number such as efficiency is not 
sufficient to desenbe the relative behavior of tests over the entire allema 
live Bahadur (1967) efficiency does provide an efficiency curve similar to a 
power curve For many cases of interest, as the alternative approaches the 
null hypothesis, the Bahadur efficiency approaches the Pitman efficiency 
See Klotz (1965) for further reading on the Bahadur efficiency and a 
companson of S, T, and t Klotz shows that the Bahadur efficiency of T 
relative to r nses from 0955 to around 098 at a location of 075o^and then 
decreases down to around 0 60 for very large location values 

The former cnticism is considered by Klotz (1963) m which he studies 
the efficiency of T relative to / for finite n For T we fix a sample size n, 
significance level a and underlying distnbution F, and a location 6 In the 
case of an underlying normal distribution it is possible to find the exact 
power of T (The general problem of computing finite sample power of 
rank tests is discussed m some detail in Section 3 3 of the next chapter) 
Next calculate the power of / for the given a and 6 to obtain sample sizes n 
and n + I for which the power of t bradccts the power of T We then 
interpolate linearly to define = +(l-6X«'+ 1) ^he sample size 

for / 

The finite efficiency of T relative to r is taken to be n*/n We illustrate 
Klotz’s results m Table 2 4 Klotz has other tables similar to this one The 
results all uuiicate that the asyroptoUc ettvcwacy \5 stSlected qmte accu 
ratcly in finite samples from a normal distribution with common signifi 
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Table 2.4. Finite Efficiency of T Rel- 
ative to t for n(0.25, 1) Distribution 


n 

a 

Efficiency 

5 

.0625 

0.986 

8 

.05469 

0.980 

10 

.05273 

0.968 


cance levels and moderate sizes of 9. Arnold (1965) obtained similar results 
for nonnormal shift alternatives. 

Another check on the asymptotic results is provided by Monte Carlo 
simulations of the power of the t, sign, and Wilcoxon signed rank tests. In 
Randles and Hogg (1973) and Randles and Wolfe (1979, p. 116) the power 
is simulated for the uniform, normal, logistic, double exponential, and 
Cauchy distributions. In most cases the empirical power for samples of size 
10, 15, and 20 is consistent with the results predicted by the Pitman 
efficiency. For example, in the cases of the double exponential, the sign test 
has the best Pitman efficacy, and hence should have the highest power for 
alternatives near the null hypothesis. This is indeed the case for 9 ~ 0.20; 
however, for nonlocal alternatives the Wilcoxon signed rank test is better. 
Hence there is additional information to be gained from an empirical 
power study. 

Using Theorem 2.6.1, we next show that the Hodges-Lehmann estima- 
tor has an asymptotic normal distribution. This is illustrated on the Hod- 
ges-Lehmann estimator derived from the Wilcoxon signed rank test in 
Example 2.6.5 and confirms the asymptotic distribution suggested in Exam- 
ple 2.4.2. It is interesting to note that it is the asymptotic distribution of the 
test under a sequence of alternatives that determines the asymptotic distri- 
bution of the estimator. We begin with a preliminary result. 

’^eorem 2,6.4. Suppose F is a statistic satisfying the conditions of Defini- 
tion 1.5.1 and that 9 is the corresponding estimator. Then 

< Po) < P{9<a) < P{V{a) < Mo)- 

Proof. From Definition 1.5.1 we have 9** < a implies that V{d) < lu and 
hence 9** < a. These inequalities imply 

P(9** <a)< P(V(a) < Mo) < P(S** < a). 

Since 9** is a continuous random variable (see Hodges and Lehmann, 
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1963), wc have P{6** < a) - P(V{a) < Likewise. P(»* < a) = P(V(a) 
^ Mo) 

Since g* < S < 0**, if 0** < a then a and 0* <a Hence P(9** 

< a) < P(^ < a) < P(6* < a) and finally P(K(a) < ^q) < °) 

< Mo) 

Theorem 2 65. Let S be the Hodgcs-Lehmann estimator corresponding to 
a statistic V which satisfies the Pitman conditions 1-5 with efficacy c Then 


- « ) < a] - ^(ac), 

that IS, IS asymptotically n(0,f “*) 

Proof By Theorem 1 5 1 we have 

P,[^(^-8) < a] - P((^^ < a) 



From the previous theorem we can write. 

Now pq is the point of symmetry of the null distribution of K, hence 
^o(^(0) < Mo) converges to 0 5 which corresponds, in Theorem 26 I, to 
Z j = 0 Further. 

which converges to 0(ac) " 4»(a/{<’"*)'/^) by Theorem 26 1 The same 
limit exists for the nght side of (2 6 24), and hence P,(n'^*(^-^) < c) 
converges to 4>(ac) 

The asymptotic efficiency of two asymptotically normal estimators is 
generally defined to be the reciprocal ratio of their asymptotic vanances 
Hence, if i “ 1,2, are asymptotically /i(0,o,^), « = 1,2, then the efficiency 
of relative to is 


rM=^ 


(2 6 25 ) 
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It follows at once from Theorem 2.6.5 that the efficiency of two 
Hodges-Lehmann estimators derived from Pitman regular tests is identical 
to the Pitman efficiency of the respective tests. Hence efficiency properties 
of tests are inherited by the estimators. 

Example 2.6.5. We now list the asymptotic distributions for rnedA",, 
med,^^(A', + Xj)/1, and X and note their relative efficiencies. These are the 
Hodges-Lehmann estimators corresponding to S, T, and t, respectively. 
Hence, from Example 2.6.3, they are asymptotically normally distributed 
with asymptotic variances l/4/^(0), 1/I2(jf\x)dx)^, and aj, respectively. 
For example, from Theorem 2.6.5 it follows that 

e|med ^ ‘ ^ j ~ (/ 

= e(T,t) 

> 0.864. 


in. ASYMPTOTIC LINEARITY OF THE WILCOXON SIGNED 
RANK STATISTIC, SUMMARY AND EFFECTS OF DEPENDENCE 

In this section we discuss the approximate linearity of T(^9), (2.3.1), as a 
function of 9. This idea will be made precise in Theorem 2.7.1. The 
approximate linearity of T{9) enables us to study the asymptotic length of 
the confidence interval derived from T. Further, we can offer a heuristic 
development of the asymptotic distribution of 9 and the asymptotic local 
power of T. Hence the approximate linearity of T{9) ties the point and 
interval estimates and test together nicely. Asymptotic linearity is crucial to 
the development of the distribution theory of rank tests and estimates in the 
linear model. At the end of this section we summarize the properties of the 
Wilcoxon procedures and briefly discuss the impact of lack of indepen- 
dence in the data. 

We will work with 


T{9) = 


1 

n{n+ 1) 


T{9) 


I 

n(ji + 1) 


><J 


(2.7.1) 
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where T,j(0)=i if (X, + Xjy2>$ and 0 otherwise To simplify the 
discussion wc will suppose, wi^out loss of generality, that the true value of 
IS 0 Later we will state the results for a true value 
Example 2 5 1. since F e f),, 

( 212 ) 


where 


Further, the Varof{tf) » Var.,f(0)-»0asn-4 e©. hence f(?) converges in 
probability (when the true value of $ is 0) to p^t, - 6)/2 This can be written 
T(^)"/» 2 (-tf )/2 + 0 ,( 1 ) where 0 ,( 1 ) are the small order terms that tend to 
0 m probability (under 6^0) when n increases In general we will use the 
following notation o,(8) means that o,(5)/8 converges to 0 in probability 
as S tends to 0 

If we can differentiate (2 7 2) with respect to 9 under the integral, then 

F:(-<>)-=rj(0)+^pK0)+o(®) 

where a{e)/9-*0 as and pi(0)« -2jp{x)dx Hence, for small 9 

and large n, with high probability 

!'(»)- 1 -"//V)*- 

and this suggests that f(«) is -approximately” linear m 9 with slope 

Before we provide a ngorous statement of this approximate linearity, wc 
develop a technical lemma 

Theorem 2.7.1. Let + where U^(b) is monotone in b 

and |c„| < c < eo Suppose that for each b. F, (£i)-^0 as « -♦ » Then for 
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any B >0 and any e > 0, as « co 

P{ sup \ V„{b)\ > e}->0. 


Proof. Let e > 0, y>0 be given. Partition into d intervals: 

- B = bQ<b^< ■ ■ ■ <bj = B such that \b, - < e/lc for / = 1, 

Since there are finitely many b„ we can find an integer N such that 
n> N implies 


/’{max|F„(6,)| <1 } > 1 - y. 

Without loss of generality suppose lJ„{b) is nonincreasing. If !&[ < B 
then b,_^<b<b, for some /. First suppose V„(b)>0. Then \V„{b)\ 
= V„{b) and we can write 

lK(b)l= (/„(b) + c„b 

< P>n{K-\) + c„b,^^ + c„{b - b,_f) 

<\V„{b,^^)\ + \cXb~b,_{) 


<max|F„(h,_,)l+ | . 

A similar argument applies when V„{b) < 0, hence 

sup |F„(6)|<maxjF„(fe,)|+ | . 
1*1 ^ 


Finally, for n > N, 

P{ sup |K„(6)|>€] < F(max|F„(6,)| + |>e| 

< F{max|F„(6,)|>|J < y, 

and this completes the proof. 

We now formalize the ideas of approximate linearity for the Wilcoxon 
signed rank statistic. 
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Tlicorcm 2.7 J. Suppose that F efl,, and < eo Suppcsefte 

true value 0 ••O TTien, denoting probabilities computed under 9 « 0 by 
Pcfi ). for t > 0 and B > 0, 

P„j SIV |^r[f (6/VJ)- f(0)] + ij" /'(x)*|> t j -0 

(113) 

Proof In Theorem 27 1 lei U^(b)^ T(Q)] and c,s 

//^(x)<ir. then we must first show that -bff^(x)dx Now 

- JT X. + X, - 


■ T0r) { ^ f (0)] 


+i.(f(i./«)-no)]j. 


where F*(0 - P^X, + X^)/! < 0 =« r«f(2f - x)f{x)dx Hence 
F*( 0)]/2 

Provided the denvative exists, we have 


EV,(b)~ 


-t _ 


b/i/n 




Since F* is a convolution of Iwo absolutely continuous cdfs Theorem A17 
implies that it is also absolutely continuous with pdf at 0 given by 

2/ /P> 2| /’(x)* 


This shows that EU„{b)-> ~bjf\x)dx 
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Next we must consider the Var U„{b). Let 1 if ^ 

bfn^^^] and 0 otherwise, then 


Y&vU„{by 


n\n + 1)^ 


:Var(22',)- 


Now using the argument in Theorem 2.5.1, 

Var^22-fy j ~^~ 2 — ~ i)(” ~ 2 )Cov(/,2,/i3), 

hence Var L/„(6)~Cov(/,2,/i3) and 


|C0V(/|2,/i3)l — |^/i 27 i 3 (^/|2) I 

<£:/,2 + (£/,2f. 

But £1^2 = F*{b/ — F*(0)->0 and so Varl/„-»0. By Theorem A5, 

p 

U„{b)-^ ~bjf\x)dx and V„(b)->0. Since U„(b) is nondecreasing m b we 
can apply Theorem 2.7. 1 and the proof is complete. 


Now, (2.7.3) can be expanded and rewritten in vanous ways. Equations 
(2.7.4)-(2.7.6) provide three forms that are useful. First we have 

4nT{b/^) -{b- a)j f{x)dx+ Op(l) (2.7.4) 


uniformly for a,b such that —A<a<b<A. Next, if we take Q = b/n}l^ 
and 6q = afn'^^ so that — Oq] < A then 


V« 7’(0)=^/n^(0o) - 1 / 77(0 - 0o)J f\x)dx (2-7.5) 

which provides a kind of Taylor series approxim^ion to n'/^T(0) about Oq. 
If 00 denotes the true value of 0 so that and YdLtgT(6^ 

~\/\2n, then we have, for 77 '/^|0 — 0 o | < / I , 


i^[r(0)-l/4] i/;^[r(0o)-l/4] 

/i7T2 Viyu 



(2.7.6) 
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From (2 7 6> we see that r(fl), when standardized, has appyoMmatt slope 
n'''Vr, the efficacy of T given in (26 13) This ties together the properties 
of tests and the estimators derived from them 

Our first application of (2 7 6) is a denvation of the asymptotic length of 
the confidence interval, (2 3 3) dtnved from the Wilcoxon signed rank lest 
T Recall that if /’(T < C,)*- 7»(r > Af - C,) « a/2 (let Cj^A'-C, 
denote the upper critical point) then is the (C, + l)st or (A' - Cj + l)st 
ordered Walsh average From the discussion following Definition IJl, 
T{Si} = Cj -J Since _ihe result is an asymptotic one, w^will transfer the 
argument to T with ET = 1/4 and Vafr— -1/12/1 Since Thas an approxi 
mate normal distribution we can write 


-z.n (2”) 


where 2,/j is the upper a/2 percentile of the sundard normal distnbution 
From Exercise 2 10 17 has an asympioiie normal distribution 

Hence, for « > 0 there exists a positive integer S and a positvc number A 
such that when n> S, - ^ol < d) > 1 - and we will say 

- ^o) IS bounded in probability 

The boundedness in probability of - Sg) allows us to combine 

(2 7 7) with (2 7 6) for large n with high probability and wnte 

Z./!- - //’('<)* 


'S[T(«0-1/'*] 


Similarly, 


- 2.^, - - J /•(;,) * 

Subtracting and rearranging we find 


Mll\x)(lx ’ 


(27 8 ) 


where the approximation means for large n with high probability More 
formally we see that the standardized length of the Wilcoxon confidence 
intervaf - 6 ^)/ 2 Z^/ 2 . converges m probability to l/(I2)''’^//^Wohr 

= \/Cf, the reciprocal of the efficacy Note that we need the uniform 
convergence to replace fl in (2 7 6) by and 4^ 
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Later we will see this is a general property of confidence intervals 
derived from tests. In the exercises, you are asked to verify this for the sign 
and t tests. The ratio of squared lengths of two intervals is a measure of the 
relative efficiency of the intervals, and this ratio converges in probability to 
the Pitman efficiency of the respective tests. In this sense the test, point 
estimate, and confidence interval all share common efficiency properties. 
The results of Examples 2.6.3 and 2.6.5 on the sign, Wilcoxon, and t 
procedures now extend to the confidence intervals. 

Our next application of (2.7.6) is a heuristic derivation of the asymptotic 
normality of — where 0^ is the true parameter value and 6 

= med,^^(2f, + A^)/2; see Example 2.6.5. The estimate 0 can be defined by 

/TTn 


Hence 



^{O~0o) 


1 ^[m)-l/4] 


The second factor is the standardized Wilcoxon signed rank test which is 
asymptotically standard normal. Hence the right side is asymptotically 
normal with mean 0 and variance 1/I2{jf\x)cixf. This heuristic deriva- 
tion shows how the asymptotic distributions of the test and estimate are 
related through the approximate linearity with the efficacy playing an 
importent part. A rigorous argument would require that we first show 
-Oq) is bounded in probability (Theorem 2.6.4). 

The final application of (2.7.6) is a heuristic derivation of the local 
asymptotic power; see Theorem 2.6.1. Let = 9/n^/^. The asymptotic size 
a critical region for T is determined by 


^fV/n 


> z„ 


where Z„ is the upper a percentile of the standard normal distribution. The 
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power IS then 

[ -lUn 

•mn 

[j^[f(0)-l/4] 


r(0)-i/4l 


""1 

“ I - ‘jfz, - tf/n J/'(;e)<£T) 

We now summarue the properties of the Wilcoxon procedures We 
suppose that AT, AT, are 1 1 d /•(* - ^ ) f e hence we have the 
symmetric location model The Wilcoxon signed rank sutistic T is distnbu 
lion free under Hq 6^0 The test i$ unbiased (apply Exercise 1 8 4) 
consistent, and has positive tolerance (asymptotically 029) to both accep- 
tance and rejection The symmctnc distnbuuon of T under Hg is easy to 
table and the asymptotic distnbution is normal under both null and 
alternative hypotheses The Pitman efficacy is given by ** 12'/*r/^(x) A 
and the Pitman asymptotic efficiency relative to the r test is 0 955 for an 
underlying norma! distnbution 1 19 for a 5% contaminated normal d stn 
bution and is never less than 0 864 For heavy tailed distnbuUons around a 
double exponential distnbution the sign test is more efficient Furthermore 
the P tman efficiency reflects quite well the small sample and nonlocal 
alternative properties of T In Section 6 2 the Wilcoxon signed rank test is 
ext^ded to the one sample multivariate location model 
The Hodges Lehmann estimate of is « = med(A’ +X)/2 i<j The 
estimate is unbiased and symmetrically distributed about 9 It has pos live 
o erance (asymptotically 29) and a bounded continuous influence curve 
Hence It ts a robust estimate Moreover n/\i-e) is asymptotically 
normally distnbuted with asymptotic vanance l/ci^ l/]2((f<x)dx)'‘ 
Hence 9 inherits the efficiency properties of T 
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The confidence interval generated from T is distribution free and 
(length)/2Z„/2 converges in probability to l/c^.. Hence, when efficiency of 
confidence intervals is defined in terms of their lengths, the Wilcoxon 
confidence interval inherits its efficiency properties directly from T. Fi- 
nally, T{9), as a function of 0, is a nonincreasing step function with steps at 
the Walsh averages and is approximately linear in 0 for large n. The slope 
of the linear approximation is proportional to c-j-. The approximate linearity 
can be used to show how Cj- determines the properties of the efficiency of 
the test (through local asymptotic power), the asymptotic variance of the 
estimate, and the asymptotic length of the confidence interval. 

We now consider the simple model of serial correlation introduced at the 
end of Section 1.7. The model specifies that (A',, X, +,),/= 1,2, ... , has a 
bivariate normal distribution with means 0, variances 1, and correlation p. 
The results of Gastwirth and Rubin (1975), which will not be derived here, 
show that, for the serial correlation model, the projection still determines 
the limiting distribution of the Wilcoxon signed rank statistic. (See their 
equation 3.25.) In our model, the marginal distributions are standard 
normal, denoted 4>(-)- Hence, from (2.5.18) in Example 2.5.5, n~^/^V'p 
= ~ 1/2) determines the limiting distribution of T under 

Hq-.O- 0. The sequence $(A'i), ^(A’j), ... is a 1-dependent sequence, and 
Theorem A16 implies that is asymptotically n(0,a^) with 

= Var$(Z,) -t- 2Cov($(Z,), ^'(Zj))- 

Hence, from the discussion in Example 2.5.5, we have n~^^\T — ET) or 
,j-i/2(j’_ under Hq, have the same limiting distribution as n~^^^Vp. 
The random variable $(Z,) has a uniform distribution on (0, 1), and so 
Var$(Z,)= 1/12. Next, we consider: 

Cov{^X^),^X,)) = £[$(2r,)3»(Z2)] - £$(2r,)£^(Z2). (2.7.9) 

The expectation is taken with respect to the bivariate normal distribution, 
denoted $(x, y). Let U and V be independent n(0, 1) variables, also 
independent of (^ 1 ,^ 2 ), then 

£4>(2r,)$(Z2) = J j'F( t/ < x,)/’( F < X 2 ) (X, ,X2) 


= P(U< A',,F< X 2 ) 

= F([/-2f| <0, F-A'2<0). (2.7.10) 

Not^that E{U- X,)iV- X 2 ) = EX^X^^p and Var(f/ - A^,) = Var(F - 
2 ^ 2 ) - 2; hence the correlation between 17 - Z, and F - X 2 is p/2. In fact, 
{U- X^, V- A' 2 ) has a bivariate normal distribution with means 0, vari- 
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ances 2, and corrclalion pfl TTie J*(U — AT, < 0, K — Xj < 0) does not 
depend on the vanances (divide the inequalities by 2'''^). hence Exercise 
1 8 14 shows the probability to be equal to 1/4 + (l/2ff)sm“’(p/2) Since 
£4>(X,)£*(Xj)“ 1/4, (2 7 9) becomes (l/2ff)sm“'(p/2) and 

«’-i + is'"-'(p/2) pill) 

Thus, n}'\f— 1/4) IS asymptotically n(0,o*). and. if we suppose that we 
ate sampling from an 1 1 d sequence, then a nominal 5% test, for large n, is 
given by f > 1/4+ 1 645/(12ny/^ However, the true level is 

a-f^P{T> 1/4+ I 645//T2^) 

= p(i/n(f-t/4)> 1645//il) 

/l + ^sii,-'(p/2) 


-l-ai - „ (2712) 

'(p/2) 1 

_ Another line can now be added to Table 1 4 reflecting the true level of 
T However, the line for T is almost identical to that of X This is not 
surprising since sin 'r = r+rV6+ . and so 1 +(l2/w)sm"'(p/2) 
= I + (12/227 )p*= 1 +2p, the asymptotic vanance corresponding to X 
Hence, with this simple model of dependence in the data, the superior 
stability of level of T over t completely vanishes 


2,8 GENERAL SCORES STATISTICS 

In the first seven sections of this chapter we have developed methods for 
testing and estimation based on the Wilcoxon signed rank statistic The 
■necessary distn’craUon iheory, botii finite and asymptotic, lor constructing 
tests and confidence intervals is given in Sections 2 1-2 3 In the later 
sections we developed the local asymptotic power and efficiency of T In 
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particular, we find that if the underlying population is symmetric, then T 
often provides a more efficient set of inference methods than the sign or t 
tests. However, T is not uniformly best, and no optimality theory similar to 
that of the sign test in Chapter 1 is possible. 

In this section we generalize from T, which is based on the ranks of the 
absolute values of the observations, to statistics based on functions (called 
scores) of the ranks of the absolute values. We provide the asymptotic 
theory necessary to construct approximate tests and confidence intervals. 
We give a heuristic development of the efficiency properties in the next 
section and consider the problem of determining the most efficient rank 
score statistic for a given underlying distribution. 

Definition 2.8.1. Let 0 = a(0) < a(l) < • • • < a(n) be a nonconstant se- 
quence and define 


7=1 

7=1 

= ( 2 . 8 . 1 ) 

7 = 1 

where Rj is the rank of among 1A',|, . . . , \X„\, Wj is given by (2.2.1), 
and Dj is the antirank in Definition 2.2.1. Then V is called a signed rank 
score statistic. Note that if = l,y = 1, . . . , n, then V= S, and if a =j, 
then V = T. ^ 

In Exercise 2.10.4 you are asked to show that if Xy, . . . , X„ are i.i.d. 
T(a), F £ then 


and 


7=1 


VarF, = i 2 a/ 
Cov(F„F,) = 1^2 0,6, 


( 2 . 8 . 2 ) 
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where y, = 2^^ ^nd y, = 2^ follows at once from Theorem AIO 
that ( r - £^')/(VaT has an asymptoUc standard normal distnbulwn 
provided 


lim 



<=0 


{2 8J) 


Hence using (2 8 2) and {2 8 3) we can easily determine when a normal 
approximation is possible for a signed rank score statistic y Often the 
scores are given by a score generating function 

Definition 28 2 Suppose <Xu) 0 < u < 1 is nonnegatjve and nondecreas- 
ing Suppose further that oo and 0</^*( «)</«< w Define 

a(0 = <k'/(n+ 1)J Iben 


IS the statistic generated by the score generating function ) Note that 
^u) * 1 0 < u < 1 produces S and ^u) ~ u produces T 

Theorem 281 If ^ ) generates V then 

and 

v-Ev ^ 

has an asymptotically standard normal distribution 

Proo] The moments follow directly from (2 8 2) and the definition of the 

Riemann integral To establish a^mptotic normality from (2 8 3) we must 
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show 


— ^0 as n->oo. 

We write the square of the left side as 


We now need only show the numerator converges to 0 since the denomina- 
tor converges to 0 < f^\u)du < oo. Since ip(t/) is nondecreasing, we have 



,2( « ) < f 

[n+ir 

i n + 1 i 


Since 0 < j^\u)du < oo, as « -> oo the right side tends to 0. 


This theorem shows that for a large class of possible test statistics, we 
can use a normal approximation to construct critical values for the tests. In 
Exercise 2.10.20 it is shown that (F,, V 2 ), when properly standardized, has 
an asymptotic bivariate normal distribution. 

Bickel (1974) provides an Edgeworth approximation for the general 
scores statistic. Provided the regularity conditions discussed by Bickel are 
satisfied, we have 


pf KZ.EV ^ 
V VVarF 


= ^>(I) + 


fo<l>\u)du 
12/j (f},<l>\u)duf 




(2.8.4) 


where is the n(0, 1) pdf. 

A heuristic development of the asymptotic testing tolerance. Definition 
1.6.2, can be given for the general scores test. In accordance with the 
definition of tolerance to acceptance, for given values of • • • > -^n> we 
will take x,, . . . , to be negative with large absolute values. Then we 
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and we fail to reject if 




We have used the asymptotic mean and standard deviation to define the 
approximate critical value Tolerance to acceptance is the smallest such a, 
and in this hcunstic development we will suppose that for large n.a = vi 
Hence < is the asymptotic tolerance to acceptance Then for large n, 
t>tid c IS the solution of 


since the second term m the definition of the critical value tends to 0 In the 
case of tolerance to rejection fix ,x^ and choose x, .x^^i 

positive with large absolute value Then 


and we reject if 


Again, if we suppose that fr -= 8» for large n. then the asymptotic tolerance 
to rejection is 5, defined by 

j;'. 

In typical cases, 5 = e, and this asymptotic testing tolerance is the same as 
that defined in a much more abstract setting by Rieder (1982) It is also the 
same as the breakdown point of an estunalor derived /rojD a rank statistic, 
see Huber (1981, p 67) 


91 


2.8. GENERAL SCORES STATISTICS 

Example 2.8.1. Normal Scores Statistic. Let denote the standard 
normal cdf and define_$+ (x) = 2$(x) — 1 = /*(|A'| < x). Let ^(w) = 
4)“'(zf) and the statistic V is then 

proposed by Fraser (1957) as a one-sample normal scores statistic. Note 
that + 1)) = 4'“'(l/2+ i/2(/i + 1)) is roughly equal to £'|A'1(,). 

In Exercise 2.10.21 you are asked to show (j>(u) = satisfies Defini- 

tion 2.8.2 and hence is a scores-generating function. Approximate normal- 
ity then follows from Theorem ^8.1. If rather than $+'(//(/! + I)) 

IS used in the computation of F, then tables of the values of E\X\^iy are 
available in Govindarajulu and Eisenstat (1965) and Klotz (1963). A simple 
approximation to $"'(«) is given by 4.91 [m° — (1 — uf This approxi- 

mation IS based on Tukey’s X-distribution and its accuracy is discussed by 
Joiner and Rosenblatt (1971). 

Since $+'('<) = 4*'''i(u + l)/2], the asymptotic tolerance to acceptance 
is defined by 

Exercise 2.10.21 can be used to reduce this to 




1 

2 


and € = 2(1 - $(Vlog4 )) =.239. A similar computation shows that the 
asymptotic tolerance to rejection is also .239. This is a bij. less than .293, the 
asymptotic tolerance of the Wilcoxon signed rank test (Exercise 2.10.5); 
however, the normal scores test is still much more resistant to outliers than 
the ( test. 


Example 2.8.2. Winsorized Signed Rank Statistics. Let ^(m) = min(u, 1 — 
Y), 0 < « < 1; see Fig. 2.5. The statistic V = + 1))j(A',) as- 

signs the rank to the (1 — y) 100% observations smallest in absolute value 
and (1 - y) tirnes s(x) to the (y) 100% observations with largest absolute 
value. Hence V represents a mixure of (1 - y) 100% Wilcoxon and (y) 
100% sign scores. Winsorization is a term coined by John Tukey to denote 
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ihe process of replacing values beyond a poinl with the value at that potnl, 
for example the Wmsoriied mean Wc see from Exercise 2 1022 that lit 
asymptotic parameters m Tbeorem 28 1 are (1 - y and(1-YftU 
2 y)/ 12 and hence /i'/'(P-(| -yVWl - y)*(l + 2 y)/ 12]'/’ has m 
approximate standard normal distribution A comparison of the exaciinil 
asymptotic parameters is discussed in Exercise 2 10 23 
The asymptotic tolerance to acceptance is e, defined by 


1! l" ana . - I - Id - tV?'’ 

i w; d- y)'/ 2 + (1 - YXv-<)-(l-y')/'' •"4 '■ 

Y)/ Note that In thn first case I - c .. [(1 _ y*)/ 21'^^ < 1 - T 
requires y < 1/3 Likewise in the seconj case y > 1 Hence the asintplouc 
tolerance to acceptance n * 


}/o~ r’)/2 
l(‘ + Y)/4 


r<i 

r>i 


It IS easy to check that the a«™.. . 

Estimation tolerance is discuSd tolerance to rejection is the same 
By lettine v ranve from n . **** Example 2 8 4 

signed rank test to L sign tes^'sLl*'' from the WilcoW 

weight of the underlymg^opulaS^' fT concerning the U 

a„.St,rs;L:s'"Va;o™Vo“^^ 
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something closer to a double exponential suggests y around 1. Rieder (1981, 
1982) has developed an asymptotic robustness theory for rank tests. His 
work indicates that, in any case, some Winsorization may be desirable to 
achieve the stability properties discussed in his work. For a further discus- 
sion of Winsorized signed rank statistics, see Policello and Hettmansperger 
(1976) and Hettmansperger and Utts (1977). 

Example 2.83. Modified Sign Statistics. Let (p(u) = 0 if 0 < u < 1 ~ y and 
1 if 1 - r < M < 1. Then Let /iF, then 

F = number of positive observations for which the ranks of 
their absolute values are greater than (1 — y)(n + 1). 

When y = 1, V = S. Note that 

F= S W, 

y = ((l-Y)(n+I)]+l 

where [•] is the greater integer function and is defined in (2.2.3). It now 
follows from Theorem 2.2.1 that, under the null hypothesis, F has a 
binomial distribution with parameters p = 1/2 and n — [(1 — y)(n + 1)], so 
that critical values are easily determined. 

By varying y, we generate the family of modified sign statistics. We will 
see later that the efficiency can be increased over that. of the sign test by 
judicious choice of y. Unlike the sign statistic, under alternative hypotheses 
the distribution is no longer binomial; see Example 2.4.2. The asymptotic 
distribution under the null hypothesis is discussed in Exercise 2.10.24. For a 
further discussion of these statistics see Noether (1973) and Markowski and 
Hettmansperger (1982). 

We now return to the general signed rank score statistic F = 

^i^j) given in (2.8.1) and consider estimates derived from F. Define, as 
Usual, 


V{e) = ^aiRji0))s{Xj-0) ( 2 . 8 . 6 ) 

where R^(0) is the rank of |A) - 0\ among [A", ~ 0\, . . . , \X„ — 0\. The 
order statistics of the sample are given by ■ < ^(n)- We now 

present a theorem due to Bauer (1972) that characterizes F(^) in terms of 
its behavior at the Walsh averages, which we will write as + A'(yj)/2, 
1 < / < j < n. 
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Theorem 2 8 2. As a function of B, V(B) « a nonincreasing step function 
such that 

y(9) decreases by the amount at each 

y(0) decreases by the amount i at (A'^yj + ^Ol)/2 

ProoJ First consider values of 9 just to the left of X^J^ The value — 9 
IS positive, small, and has rank I among the absolute values Hence a, is 
included in F(9) However, as soon as we consider values of ^just to the 
right of X^^ , we see that ^ is negative but its absolute value still has 

rank 1 Hence a, is not included in V(B), and so F(9) must decrease by the 
amount a, for each y = I. . /» 

Next we consider and for some pair i < j Consider values of 6 
just to the left of (X,,-, + X,yj)/2 The value of Xjy, — 5 is a bit larger than 
|X(,, - 6\ and also larger than |Xj*, - 9|, i < A: <y, for those order sutis- 
tics between X,,, and X(^ Hence the rank of X,y, - 0 is seen to be 
y-(i-l)»_/-i+l, and since X^^, - 9 > 0, /♦! ts included m the 

computation of V{9) For values of 9just to the right of (X,,, + X(y^)/2 the 
situation IS reversed Now lX,y, - 6\ is slightly larger than X,^j ■“ B, and the 
rank of X^/^ - 9 has decreased to / - i Hence, at (X^,^ + the 

value of ViO) must decrease in a jump of sire , 


Define T^(9) - I if (X{,, + X,^)/^ > 9 and 0 otherwise, then we can 
construct the counting form of y(9) Since y($) is a step function, its value 
(height) 1$ the sum fit accumulated steps al Walsh averages to the nght of 9 
Hence 


( 287 ) 

'</ 

From Exercise 2 10 2, under the model that we are sampling from a 
symmetric distribution F(,x ~ 9),FG ft,, we see that y[9) is symmetrically 
distnbuted about '2o,/2 Generally a Hodges Lehmann estimate. Defini- 
tion 15 1, can be determined from 

If we restrict attention to scores such that 

aj_,+ f — ^ ^ j orO (288) 

then the steps are constant and occur at selected Walsh averages, see (2 4 1) 
in SectTOH 1 4 Let the set B be deemed by 


(2 89) 
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Then B specifies the set of Walsh averages where steps occur. We can now 
write V(0) as follows: 




‘<J 




/?(« + 1) 


y + X 

# I — > 0, (/, y) e 5 I • (2.8. 10) 


The restriction in (2.8.8) reduces the statistic to counting Walsh averages. 

In the next example we discuss the point and interval estimates of 0 
derived from the Winsorized Wilcoxon statistic of Example 2.8.2. We also 
derive the estimation tolerance and compare it to that of the trimmed 
mean. 


Example 2.8.4, The Winsorized signed rank statistics in Example 2.8.2 
provide an example of (2.8.10) with a, = <p(i/(n + 1)) = min(//(/i + 1), 1 — 
7). Hence, if [•] denotes the greatest integer function, 


/ 


a. = 


n + 1 ’ 
1-7, 


l-7i.e.,0</<[(l-y)(/7+l)] 

1 - 7 < < 1 i-e., [(1 - y)(n + 1)] < / < « + 1, 


and 


= ^ if y-/<[(l-7)(«+l)]. (2.8.11) 

[If (1 - y){n + 1) is an integer, it must be reduced by 1 in (2.8.1 1).] Thus we 
see that Winsorization of the ranks results in the restriction to Walsh 
averages that are formed from order statistics that are not too far apart in 
the sample. For example, + X^„y)/2, the midrange, would be the first 
to be excluded. 

The Hodges-Lehmann estimate based on the Winsorized signed rank 
statistic is 
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where {(i.y) ; - i < ((1 - yX»+ 01} Theorem 2 4 1 can be applied 
to show that is symmetncally distnbuted about 9 

Now let - 1(1 - y)(« + 01 theft there are ^‘ = {«(n + 1) “ (fj- 
_ |)(„ _ Ny)}/2 pairs (i.y) such that j- i < Let I'" «{n + l)y in 
(2’^8 10), so y•‘it(_X^n + X^f^)/2>0,(l,/)eB, and ^ is the number of 
positive Walsh averages subject to the restnction imposed by B From the 
asymptotic normality of V we can write (from Example 2 8 2) 

;»( r < c) - /•( P < c/n(n + 0) 

^ V« [c/n(«+ 1) ~(t - T^)/4] 

V(1 -7>'(1 +^)/12 


-a/2 

and equating the sundardized cniical value with -Zg/i, the lower stan- 
dard normal a/2 percentile. 




n{n + 1) 




Y)^(H-2Y)n(n+l)^ 


12 


The (I - a) 100% approximate confidence interval for 9, based on the 
Winsonzed signed rank statistic is then where If^i) 

< < the ordered Walsh averages determined by B, com- 

pare this to (2 3 3) 

We now illustrate the calculations in a simple example using the first six 
observed differences in Table 22 Take y— 1/3 so ^^ = {l4/3)-4 and 
hf* » 20 Hence we need the Walsh averages that 

y - 1 < 4 Arrange the observed differences in order and form Walsh 
averages of pairs at intersections of diagonals as follows 


1 IL 17 ^32 69 90 

6 ^24 S 50 5 79 5 

9 ^15,^ 43 61 

16 5"^ 411 53 5 

S5 ^505 


For example 2l5 = (n+32)/2 The restnctiony — i < 4 means we need 
the five rows displayed The estimate S is + W^,„)/2 = (32 35)/2 
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1 a }j-Ny » 

Figure 2.6. j — i < Ny — [(1 — y){n + 1)]. 


= 33.5. For an approximate 90% confidence interval take Z „/2 = 1.645 and 
compute c = 2.33; so, to be conservative, we use c = 2. The interval is then 

We now consider the asymptotic tolerance of 6, the estimate derived 
from the Winsorized signed rank statistic. Figure 2.6 helps identify the (i , j) 
pairs in B. 

The number of lattice points in the trapezoidal region is # B, needed in 
Theorem 2.4.2. Further, #5 = = {«(/? + 1) — {n — — l)(/i — N^)]/ 

2 where = [(1 — y)(n + 1)]. We are interested in the asymptotic tol- 
erance, so in the following argument we replace n + \ by n and = 
1(1 - V)(n + 1)] by (1 - y)«. Then #B={n^-{n- = (1 - y^) 

n /I. Next we need # j and again we consider # since the 1 will not 
matter much for large n. The hatched region in Fig. 2.6 contains the lattice 
points defined by when a < n — = ny, and the cross-hatched region 

conesponds to when a > ny. The area of these two regions approxi- 
mates # as 

if a>ny 

|(1 — y^)n^/2 — a(l — y)n if a < ny. 

The case a < ny is computed from Ny/2 for the triangle and Ny{n — N^ — 
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a) for the parallelogram Wc arc now ready to solve for a such that 


or 


(1 - _ [(/> - a)V2 if a>ny 

^ |(t - Y*)b*/ 2- a(l - Y)n aCny 

Under the condition a>RY, we have a quadratic equation in a with 
(admissible) solution 


5-I-V0-tV2 

This IS an approximation to the tolerance provided a/n > y, that is, 
provided I - {(1 - y’)/2]‘'* > y This holds provided y < 1/3 TTic second 
condition, a < ny, yields 



and IS compatible with y > i/3 Hence the asymptotic tolerance of 0 is 
‘-^(‘-^2 If y<1/3 

|(I+y)/4 if y>l/3 

Note that if y = 0 then t = 1 — (1/2)*^*. the tolerance ol ^ = med(X, + 
Xj)/2, t < j, and if y = I then t = 1/2, the tolerance of ^ = medA'j The 
result also matches the asymptotic testing tolerance m Example 2 8 2 
Recall from Example I 6 1 that the tr^med mean based on the middle 
n — 2[«a) observations has tolerance t(,Y„) = a If we compare this toler- 
ance to that of § 2 a- estimate denved from a Winsonzed Wilcoxon 
statistic using (2a) 100% sign scores and (1 - 2a) 100% Wilcoxon scores, 
we find t(^ 2 (.) > T(A'a) for every a, 0 < a < 1/2 Hence these estimates are 
more resistant to outliers than the tnmmed means See Hettmansperger and 
Utts (1977) for further discussion 

A similar analysis of estimates denved from the modified sign statistics 
of Example 2 8 3 is outlined in Exercise 2 10 25 The structure of these 
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estimates is simpler than that of the estimates derived from the Winsorized 
signed rank statistics. 

For convenience in computing the influence curve for an estimate 
derived from V, we will put the statistic into a more symmetnc form. Let 
sgn(x) = - 1 if X < 0, 0 if X = 0, and +1 if x > 0. Then define 

F-(fl) - 1 2 jsgn(X, - 9). (2.8.12) 

Note that V*(9) = 2V(9) — + 1)), so the Hodges-^ehmann 
estimate mentioned under (2.8.7) can be defined equivalently by K*(0) = O. 

Let H(x) = F(x — 9) be the underlying cdf and H„(x) denote the 
empirical cdf. As in Example 2.4.2, we define the functional 9 = T(H) 
implicitly. We begin by expressing V*{9) in terms of the empirical cdf. First 
extend the definition of <f> in Definition 2.8.2 from (0,1) to (—1,1) by 
<#>(-«)= -^(u). 


Theorem 2,83. The equation V*(9) = 0 is asymptotically equivalent to 

Olsvr ["-M - + = 0 (2.8.13) 

and the functional 9 = T(N) is defined by 



i>(H(x)- H(-x + 29 ))dH(x) = 0. 


(2.8.14) 


f’roo/. The rank /?,(fl) is equal to the number of Xj’s such that \Xj — '9\ 
< 1^, - 0|. We can then write (2.8.12) as 


2 0 

I ■B<Xi 


1 

n+ 1 


{#Xj 3 -x, + 29 < Xj< X,) 


- 2 ^ +j:, < X < -X, + 261) 

I ■ 6 > X, L " ^ * 

Using the definition of the empirical cdf, we have 
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Since u)= the two sums can be combined, and the first 

equation follows by representing the sum as a Stiltjes integral The second 
follows by letting n-*ca 

Hence the location functional T(H) is implicitily defined by 
(2 8 14), and the Hodges-Lehmann estimate 6 is (asymptotically) equiva- 
lent to the solution of (2 8 13) We now derive the influence curve for the 
functional 6=T(lt) Typical scores functions used for the symmetnc 
location model satisfy 


4.(2u- l) + 4^l -2tt) = 0 (28 15) 

In the next theorem we derive the influence curve under this condition 
All examples m this chapter satisfy (2815) 

Theorem 2.8.4. Suppose f €0, and ^ satisfies the Definition 2 8 2. is 
differentiable, and satisfies (2 8 15) Then the functional T(F) has influence 
curve 


^ M2F(y)-\) 

«(>■) “ — -z 


Proof We begin with /f(jc) " f(A - 9) and define r(//) through (28 14) 
by 

j'"4.[/f(x)- U{-x*2T{H))]<IH[x)’=Q (2816) 

Define H,(x ) « H{x) + lidy(x) — H(x)) and apply Definition 1 6 4 Substi- 
tuting H,{x) into (2 8 16), we get 

jyi/f.M - «,(-»+ 27-(»,))]rfff(,) 

+ |J_" * [«, W - «,(-i + 27-(H,))] ,;[S,(,c) - H(»)] - 0 

We now differentiate with respect to t and then set r = 0 Further, the 
denvative simplifies if we take 9 « 0, without loss of generality Hence the 
derivative is presented with / = 0 and 9 = 0, so H,(x) is replaced by F(,x) 
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Finally, recall is the derivative of evaluated at / = 0. Hence 


r f[F(x) - F(-x)]lS^(x) - ffx) -/(-x)2S(_y) 

00 

-d^(-x)+F(-x)}dF(x) 

+ fy[F(x) - F(-x)]jd,(x)-fy[F(x:i - F(-x)]dF(x) = 0. 

Now, F(x) - F(-x) = 2F(x) - l,f(-x) = f(x); hence 

-2f2(j) r“ <l>'(2F(x) - l)f(x)dx+ r“ <l>'(2F(x) - l)S^(x)dF(x) 

J — CO CO 

-r<t>\2Fix)-\){2F(x)-\)dF{x) 

J— 00 

- r“<J.'(2F(x)- l)5^(-x)rfF(x) 

J- 03 

+ r“ <P(2F(X) - l)dS (x) - 1"“ <l)(2F(x) - l)dF(x) = 0. 

Make a change of variable I = - x in the fourth integtral, then ^(2 m — 1) 
= -^(l-2u) and ^'(2 m — 1) = ^'(1 “ 2m) imply that the second and 
fourth integrals cancel. The third integral is 2“’/’_|M<J)'(m)c?m = 0, after 
using an integration by parts. The last integral is 0 also, because <f>(u) = 
-ij>(-M). Now solve for to get 

o.,.. |-o.<?»(2F(x)-l)cf5,,(x) 

2f^„<l>'(2F(x)-l)f(x)dx 

<I>(2F(J') - 1) 

2r^<(>'(2F(x) - })f(x)dx ’ 

Huber (1981, p. 64) gives the influence curve in the general case of 
asymmetric F. He points out that asymmetric F may result in arbitrarily 
large values of the influence curve. This implies that rank estimates, such as 
the median of the Walsh averages, may lose their supenor robustness in the 
face of asymmetric contamination. 
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The influence function suggests the asymptotic distribution of 
tf), recall (162) We need 






Note that 

and hence 


ja\y)dF(y) = 




Then n''\S - 9)^ Z-^n{Q.fQ\y)dF{y)) 


(2 8 17) 


Example 2.8 5 The Wmsonzed Wilcoxon estimate in Example 284 is 
generated by «u) - min(u, I - y), 0 < u < 1 Let it; be defined by F(-k) 
-y/2 Then <J)(2/‘(x) - I) - min(2/=‘(x)- 1. 1 - y) • 2r{x) - 1 if 0< 
2f(x) - l<l-yof0<x<f '(1- y/2) » fc, and 1 - y if x > * Ex 
tend 4 >{«) to (-1, 1) by <>(-«)■ -^«) then 


«^y) 


(1 - r)>W;')/2/_*y'(4)4t, 


I>l > * 

\y\<k 


Hence the influence is not only bounded, it is Wmsonzed Sensitivity 
curves and further discussion can be found tn Hettmanspcrger and Utts 
(1977) The results of Jaeckel(l971) indicate that estimates with Wmsonzed 
influence curves have minimax asymptotic vanance for contamination 
models, see Example 295 See also Huber (1981, p 97-99) and Rieder 
(1981) 

Example 286 The normal scores estimate is determined by V[S) 
= 1 + *)) Recall, from Example 28 1, the score tancUon can 
be written as <f>(H) = d>+'(«) “ $ 'l(“ + 0/2] To compute the influence 
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1 

2’I^(#-’[(w+ l)/2]) 

where ^(x) = $'(^). the «(0, 1) pdf. Hence 

aw - [f 

Now suppose the underlying distribution is normal so that F = $. Then the 
denominator of is 1, and so ^(y)=y, the same unbounded influence 
curve as X ! The unusual feature of the normal scores estimate is that it has 
positive asymptotic estimation tolerance of .239 (see Huber, 1981, p. 67) 
and yet has unbounded influence. In the e-contaminated normal model the 
asymptotically minimax rank estimate is the Hodges-Lehmann estimate 
generated from a Winsorized normal scores function. Winsorization results 
in bounded influence. For further details see Huber (1981, Section 4.7). 


2.9. EFFICIENCY OF GENERAL SCORES STATISTICS 

The previous section provided a rigorous development of the asymptotic 
distribution theory needed to develop tests and estimates from a general 
scores statistic. In this section we discuss, without proof, Theorem 2.6. 1 and 
then apply it to find the efficacy of a general scores test. We then establish 
an upper bound on the efficacy and find the score function that generates 
the maximum efficacy for a given distribution. We also find the greatest 
lower bound on the efficiency of the normal scores tests in Example 2.8.1 
relative to the t test, a result similar to the bound on the efficiency of the T 
relative to t given in Theorem 2.6.3. We introduce an ordering on the 
tailweight of distributions and use it to compare Wilcoxon to normal scores 
procedures. Finally, we develop asymptotic maximin tests and minimax 
asymptotic variance estimates in the contamination model. 

Definition 2.9.1. The expression 
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IS called the Fisher information m /, provicded / is absolutely continuous 
and See Huber (1981, p 77) for an extension to the case 

/(/)•=« 

Hajek and Sidak (1967, p 220) show that if a score-generating function 
^u) satisfies the conditions m Ekfinition 28 2. then Theorem 2 6 I holds 
for the statistic generated by <J(m) providel, tn addition that the sampled 
distnbution has finite Fisher information, see Definition 2 9 1 Namely, if 
0, = then 




where e is the efficacy defined in condition 5 of the Pitman regularity 
conditions in Section 2 6 To compute c we need the asymptotic mean 
under a fixed alternative 

Let X^ , A*„ be a random sample from C{x) ■ F(x — $) F S R,, 
and let d,(ar) - (4* A', < x)/tt, the empirical distribution function Then 
//(*)- F(|A'| < ar)- {/( a:)- Cf-x) and H^{x) ~ < x)/n If 

Xj>0 then the rank among the at^Iule values can be wntten as Rj 
■ nH^(Xj) The general scores statistic, (2 8 4), can be written 

Since H„(x) and C„(,x) converge in probability to U(x) and C(x), it can be 
shown under regulanty conditions discussed later that 

From the definitions of H(x} and (7(x), it can be further shown that the 
asymptotic mean ii{0) suggested by the stochastic limit of V, is given by 

;.(<») -X"*[n* - «) - f(-x- »)]/(, - 9)* 

■ - F(-* - 





(2 9 2) 
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To compute the efficacy we need /i'(0) given by 

g'(0) = 2 r<f>'(2F(x) - l)f(x)i^x, 

•'0 

since f G Furthermore, a^O) = j^\u)du/A, hence 

2 rV(2F(x)- l)/^(x)^/x 

- . (2.9.3) 

The derivation which leads to (2.9.3) is made rigorous by Puri and Sen 
(1971, Section 3.6). They place more restrictive conditions on <l>(u) than are 
needed by Hajek and Sidak (1967); however, they do not need finite Fisher 
information. The main assumption made by Puri and Sen is that |<^>(w)| 
< Ar[M(l — h)]®"'/^ and l^'(“)l ^[“(1 ~ for some 5 > 0. 

We next derive an alternative form of c. Let / = fo^'(2F(y) — \)f\y)dy 
and make the change of variable x = 2F{y) — 1 . TTien 



Now F(F“’(x)) = X, so differentiating both sides shows that 


_d. 

dx 


{F-\x)) 


1 


LetM = /(F '((x + 1)/2)) and rfo = i^'(^)^/x, so =/'(F ’((x + l)/2)) 
/2/(F“'((x + l)/2))rfx and v = ^(x)/2. Finally, if ^(x)/(F"'((^ + l)/2)) 
->0 as x-»0 or 1, then integrating by parts yields 






dx. 


Now c in (2.9.3) can be written as 


Jo<j>(«)^/(M)dM 


(2.9.4) 
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where 


<>/(")- 


m) 

f(r .(^)) 


(2 9 5) 


Note that this form of c does nol require differentiability of <?>{«) This is the 
form of c derived by Hajek and Sidak (1967) under the restriction of finite 
Fisher information 

The estimates 6 discussed in Section 2 8 and denved from the general 
scores statistic have asymptotic normal distnbutions In fact from Theo 
rem 2 6 5 we have - 9) is apptoKitnatcly normal with mean 0 and 

vanance l/c^ with c given by (2 9 3) or (29 4) This agrees with (2 8 17) 
denved from the influence curve 

There is a result on the approximate linearity of F(9) corresponding to 
Theorem 2 7 I The version that corresponds to (2 7 6) states 

!’(») - r(i>o) -XV“)*/2j 

Vi Vi-t'*’*")* 

(2 9 6) 

With e given by (2 9 3) The approximation is uniform in probability in the 
same sense as in Theorem 2 72 Sec van Ecdcn ( 1 972) 

If 6u\ denotes the confidence interval denved from V then the 
approximate 1 nearity can be used to show that 


-A(6-6o)f 


2Z 


(2 97 ) 


similar to (2 7 8) in the Wilcoxon case See Sen (1966) also 


Example 291 From Example 2 8 2 the score function for a Winsonzed 
signed rank statistic is ^u) “ mm(« 1 — y) Then ^ (m) = 1 if 0 < u < 1 
V and 0 otherwise and hence 


(2 9 8) 


0 1 - y < 2F{x) - 1< I 
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If we let A(a)= F"'(a), the a percentile of the underlying distribution, 
then (2.9.8) becomes 

4.'(2F(x) - 1) = 1 0 < X < x(l - 7/2) 

0 x(l - 7 / 2 ) < X < 1, 


and the efficacy becomes 

c = — — 

^(l-y)\\+2y)/\2 

/i2/;<‘7j;/y(x)Jx 
^(1 - 7)^1 + 27 ) 

The efficiency of the Winsorized Wilcoxon signed rank test, estimate, or 
confidence interval relative to the ordinary Wilcoxon procedures is then 

Table 2.5 gives the values of e for various values of y and for normal, 
double exponential (DE), and Cauchy distributions. Note that the optimal 
Winsorization for the Cauchy distribution is y = .75, not 7=1 which you 
might expect for such a heavy-tailed distribution. 

Now, provided that satisfies Definition 2.8.2, we next show that Vj, 
the statistic generated by 4y(u), has maximum efficacy. Hence the test and 
estimates derived from are optimal in the sense of asymptotic efficiency. 

Theorem 2.9.1. Suppose X„ . . . , X„ is a random sample from F(x — ff), 
F ERj, which satisfies Definition 2.9.1. Suppose defined by (2.9.5), 


Table 2.5. Efficiency of Winsorized Wilcoxon Relative to T 


Distribution 




7 




.1 

.2 

.5 

.7 

.8 

.9 

.98 

Normal 

0.99 

0.94 

0.92 




0.66 

DE 

1.01 

1.03 

1.13 

1.20 

1.25 

1.29 

1.33 

Cauchy 

1.03 

1.09 

1.34 

1.43 

1.44 


1.35 
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satisfies Definition 2 8 2 and generates Vj \t V any other statistic 
generated by a generating function ^u), then 

and 

where e and Cy are the efficacies of K and Vf, respectively 
Proof From (2 9 4) and the Cauchy-Schwarz inequality we have 




■ VX'*' 

Next making the change of variable^ -> /“'((m + l)/2] we have 




-r 


f(r 


t 

VM'fy 


l£M 

\Hy) 


. f(y) 


dy= /(/) 


Now, from (2 9 4) Cy = ;o4>/(«)<*i/{/J«yV)‘*<)'''* = (/(/))'^^ 
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Definition 2.9.2. Suppose e and — log f(x) is convex. Then we say / 
is strongly unimodal. 

Note that if / is strongly unimodal and if 0 < 1(f) < co, then (p/M 
satisfies Definition 2.8.2 and is a scores-generating function. In fact, strong 
unimodality is necessary. _ 

From Theorem 2.6.1, the asymptotic local power of the test based on Vj- 
is 1 - $(Z„ — ffcy). When this is compared to 1 — $(Z„ — dc) for some 
other statistic V, we have 1 — 4*(Z„ — ffcj) > 1 — 0(Z„ — 0c) since 9>c. 
Hence we refer to as the asymptotically most powerful rank test 
(AMPRT). The asymptotic testing tolerance of is given in Exercise 
2.10.30. A different type of optimal test, called a locally most powerful rank 
test (LMPRT), is developed for the two-sample location model in the next 
chapter. At the end of Section 3.5, we show how the corresponding 
one-sample LMPRT can be constructed. See (3.5.25) and (3.5.16). 

Note also that the estimate 0y derived from has the property that 
n'^^-0) has an asymptotic normal distribution with mean 0 and 
variance \/cj = 1/ 1(f), the Rao-Cramer lower bound. Hence 6j has the 
same asymptotic variance as the maximum likelihood estimate. In this 
sense is an asymptotically efficient estimate. Bickel and Doksum (1977, 
Section 4.4.C) briefly discuss the asymptotic efficiency of the maximum 
likelihood estimate and give further references. See also Kendall and Stuart 
(1973, Chapter 18). The influence curve for 9f is given in Exercise 2.10.31. 

Example 2.9.2, Take \p(x) to be the standard normal density function. 
Then —\p'(x)/\p(x) = x and we have 

where $(•) is the standard normal distribution function. Recall from 
Example 2.8.1 that «>+ (x) = P(|A| < x) = 2«I>(x) - 1. Hence $(x) = 
t^+ W + l]/2. If t = $(x), then x = $~'(0. and likewise t = (x) -1- 1] < 

/2 implies x = $;'(2/ - 1). Thus, 


and the optimal rank statistic is the one-sample normal scores statistic 
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of Example 2 8 I Next note that 


Hence, if the underlying distnbution is F, tnen 

The calculations in Exercise 2 1021 show /^J(u)rfi/= 1, and from (2 9 3) 
the efficacy is 


The efficiency of the normal score procedures (lest or estimates) relative to 
th^r procedures when the underlying disiribulion is F. symmetnc, is 
where of is the variance of F From the scale invanance of 
the efficiency, we can let of ■ I without loss of generality, and write 



/’w ^ 

■K® '(fw)) 


(2 9 10) 


Note if we arc sampling from a normal distribution, where r is_optimal, 
then f(x) = 't>(x) and /(x) = ^l’(x). and (2 9 10) becomes e{V^,i)=l 
Hence, at the normal disinbution. the normal scores procedures are fully 
efficient in the sense of Pitman efficiency In the next theorem we consider 
what happens to the efficiency when F(x) is not normal The result was 
first proved by Chernoff and Savage (1958) The proof given here is due to 
Gastwirth and Wolff (1968) 


Theorem 2 9.2. Let AT,. be a random sample from F(x — 0), 

F SQj, then 
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Hence the efficiency of normal scores procedures is never less than 1 when 
sampling from a symmetric distribution. 

Proof, li aj= (x> then e{y^,i)>\, hence we suppose aj=\. 

From (2.9.10) we can write 




/(^) 


= E 


! 

+{4.-'(F(Jf)))//(A-) 


Applying Jensen’s inequality (Theorem A20 in the Appendix) to the convex 
function h(x)= 1 / x, we have 


fe > 


1 

E[+{4.-'(F(Af)))//(A')] ■ 


Hence 


fe 





= '(F(x)))dx. 

We now integrate by parts, using m = if/($'''(F'(x))), i/u = }f>'(^~'(F(x))) 
/(x)c/x/ip(^~'(F(x))) = -^~'(F(x))/(x) c/x since \J/'(x)/il/(x) = — x. 
Hence, with dv = dx, we have 


'(/r(x))) '(F(x)))|r«. 


+ J” Jc^"'(F(x))/(x)^fA:. (2.9.11) 

Now transform x,p(^-\F(x))) into F-'(<E>(w)),/,(>v) by first letting / = 
F(x) and then w = ^> ’(/). The integral fF~'(^(w))ip(w)dw = fx/(x)dx 
< 00 , hence the limit of the integrand must be 0 as ± oo. This implies 
that the first term on the right side of (2.9.11) is 0. Hence applying the 
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Cauchy-Schwarz inequality, 

4 - '(/■(,))/■(,)* 

- /" '//w *■ '(FW)//W * 

= 1 

since ^x^J{x)dx = 1 and /jc\f(x)<£t •» I Hence > 1 and e > 1, which 
completes the proof It should be noted that the inequality is stnct except at 
the normal distnbulion Hence the normal scores procedures are strtcdy 
more efficient than the r procedures except at the normal tntxicl where the 
asymptotic relative efficiency is I 

The implication of many of the results in this section is that as the 
tailweight of the underlying distribution increases we should use statistical 
methods that emphasize the extreme sample values less Resistant methods 
provide such protection and abo have excellent efficiency properties 
Examples are the regular or Wmsorued Wilcoxon and normal scores 
procedures relative to the t procedures Provided the tails are sufficiently 
heavy the sign procedures are also quite good 

We now provide a formal definition of tail ordenng for distributions and 
use this concept to compare the Wilcoxon and normal scores methods 

Definition 2 Suppose F and C are in D, We say F has lighter tails 
than G (or C has heavier tails than F) denoted F<G if G '(f(x)) is 
convex for x > 0 

This definition was introduced by van Zwet (1970) in his study of 
convex transformations of random variables A survey of other definitions 
of tail ordenng is given by Hettmansperger and Keenan (1975) See also 
Gastwirth (1970b) 

Note that (I) FKF^nd (2) F<G and G<H imply F<f{ Hence < is a 
weak ordenng It F<G and G<F we say F and G are equivalent Let 
F(x) = G(<3 x) for a > 0 then C '(F(x))«= ax so also F '(C(x)) 

= x/n so G<F Hence distnbulions that differ only in scale are equiva 
lent This means that tail ordenng is a family property and does not depend 
upon scale 
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Suppose F, (? e with positive densities at 0. Further, without loss of 
generality take /(O) = g(0) since this can be achieved by rescaling the 
distribution functions: F(x) = F,(x/<i) with a=f{0)/g(fi). Now suppose 
F<G, not equivalent. Then q(x) = G" \F(x)) is strictly convex for some x 
and q'(x) = f{x)/g{G~\F(x)y) is strictly increasing for some x. Since 
^'(0) = f(0)/g(p) = 1, q'ix) > 1 for some x, and eventually G ~ ‘(F(x)) > x. 
Hence F(x) > Gix) eventually, and 1 - G(x) > 1 - F(x); so there is more 
probability in the tail of G. In the next example we describe the ordering of 
several distributions often used in efficiency calculations. 

Example 2.93. Uniform <normal <logistic <double exponential <Cauchy. 
We will verify some of these and leave others as exercises. First suppose 
that F G 12^ and /(x) is nonincreasing for x > 0. Let t/(x) = 0 if x < — 1, 
(x + l)/2 if — 1 < X < 1, and 1 if 1 < x, the uniform cdf on (— 1, 1). Then 
if q{x) = F-\U(x)) = F~'[(x + l)/2] we have q'(x) = l/[2/(F-'[(x + 1) 
/2])] is nondecreasing for x > 0, and so U<F. 

Let L(x) = 1/(1 + e~^), — oo < x < oo, denote the logistic cdf. We now 
show $<L. The inverse L~\y) = log j — log(l — y). Let \p(x) = 
n(0, 1) density function, and note that ^'(x) = — xi//(x), then two differenti- 
ations show that 

Hx) 

= r (,;/(x)(2a>(x) - 1) - x4>(x)(l - 4»(x))). 

Now L"'($(x)) IS convex if the function in the brackets, denoted r(x), 
satisfies r{x) > 0. Since r(0) = 0 and r(x)->0 as x^ co, it is sufficient to 
show r"(x) changes from negative to positive. Now 

f"{x) = -i|'(x) + 2t(/(x)^(x) - 4xi//^(x) 

= ^(x)s(x). 

This function changes sign when 5 (x) does, where 

s(x) = - 1 + 24>(x) - 4x;|/(x). 

Again, note 5(0) = 0, 5(x)->0 as x-»co and ^'(0) = —2\j/(0)<0. Now 

woi/T 5'(^)<0 if x<l/2'/^ and > 0 if x > 

y2 . Hence s(x) has exactly one change of sign. This implies that r"(x) 
has one sign change and so L-'($(x)) is convex, and finally 4)<L. The 
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Others are much easier to check and are left for the reader to do in Excrase 
2 1031. 

Example 2.9.4. Recall, from Definition 29.2, /(x) is strongly unimodal if 
-log /(x) 15 convex In this example we will show that the double exponen- 
tial distribution is the heaviest tailed strongly unimodal distribution 
Let D(.x) be the double exponential cdf Then for x > 0, Z)(x) 
= /i^(l/2)exp{“|/!}<// 1 — (l/2)e“*. Hence, for x > 0, Z>“'(y)“ 

-io&l2(i-y)l 

Suppose F £ and / is strongly unimodal, then we wish to show that 
F<D We will show D-\F(x)) - -Iog(2(l - /'(x))] is convex for x > 0 
by showing that the denvative /(x)/(l — F(x)) is nondecreasing for x > 0 
The first step is to show that if -log/(x) is convex, then /(x-/) 
/f(x-y)< /(x'-/)//(x'-y)forallx<x'andy </ Letl = (x'-x) 
/(x' — x+y'—y) then x -y •= /(x — y') + (1 - /)(x' -y) and x'-y' 
-■ (1 - i)(_x - y')+ I(x' -y') Since -log /(x) is convex we have 

-iog/(» -^) < -itoi/(x -/) - (1 - o'os/(»’ 


and 


-log /(,'-/)< -y> 

Add the two inequalities and exponentiate to establish the result In fact, 
the condiiion is necessary, and this shows that -]og/(x) convex is equiva- 
lent to /(x) having monotone likelihood ratio Sec Lehmann (1959, p 330) 
We now show /(x)/(l - F(x)) is nondecreasing for x > 0 Let l| < fj, 
then the following statements are equivalent 

/(>.) , m 

1 - f (/,) I - f(,,) 

</('.)(! -n',)) 

/('.)jr"/(o + «<'"< /(<,)j[”/(» + <,)* 

° + ',)]* 

Now identify f, = x -y' < ij = x'-y' so x<x. and v=y'-y>0 so 
y<y' Then r, + t> = x - y and /2 + *> “ x* — y and from the first step we 
have /(r, )//(/, + r) < /(i^//(l2+ ®) Th** implies the integral is nonnega- 
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live and so/(a)/(1 — F(x)) is nondecreasing. Hence D '(F(x)) is convex 
for A > 0 and F<D. 

Theorem 2.93. Suppose F,G Then F<G if and only if fiF~\y)) 
/g(G ”’(/)) is nondecreasing for j > 1/2. 

Proof. Let q(x) = G " '(F(x)). Then q'(x) = /(x)/g(G “ \F(x))) is a non- 
decreasing function for x > 0. Further, x = F~\y) is nondecreasing, and 
x>0 wheny > 1/2. Hence q'(F~\y)) is nondecreasing for y> 1/2. 

We now discuss the effect of heavy tails on the efficiency of Wilcoxon 
versus normal scores procedures. Additional discussion is given in Hodges 
and Lehmann (1960). First, note from the definition of efficiency that by 
(2.6.13) and (2.9.9), 

..2 

r 

J — <X> 

f\’‘) j.. 

*(*-'(fw)) J 

Since F and $ are in the integrals are 2 times the integrals from 0 to oo. 
Using this fact along with a change of variable u = F~\x), we have 


ep(T,NS) = 12 


/: 


er{T,NS)= 12 


^2 


r 

J\/2 xP[^-\u)) 


(2.9.12) 


By letting /(x) = a 'ffxa ‘) and computing epiT,NS), we see that e(r, 
NS) is independent of a. 


Theorem 2.9.4. For any F e for which ep{T,NS) exists, 0 < eJT NS) 
< 1.91. 

Proof For i<x<y, <&“ '(x) < '(y), and so (l/4'(<&~ '(x))) < 

(V'l'(^ '(y))). From (2.9.12) we have 


/■ 

A/2 ,/;($-'(«)) 




du 
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and 


,(7-,ra)<l2(*(4.-'(i)))’ 

= 12,^*(0) = 6/ff-I9I 

Furlhermore lei f(x) = 1 /2 for |x| < I and 0 otherwise, then the denorai 
nator of (2 9 1 1) is eo and #(7”, ATS) ■= 0 Exercise 2 1033 shows the upper 
bound IS sharp 

Hence unlike the i procedures the nonnat scores procedures may be 
much more efficient than the Wilcoxon procedures at least for light tailed 
distributions approaching the uniform The next theorem describes the 
relationship between ef(T,NS) and the tailweight of F 


Theorem 2 9,5 Suppose FGGQ, and F<G Then ef(T A'5)<eo(r, 
SS) 

Proof Since e,(r NS) is scafe invariant we can choose the scale Let 
® where '/,(x<» ') Then lf\x)ix 

^ a''\j\{x)dx^ l^{x)dx Hence without Joss of generality we lake 
U\x)dK - ig\x)dx 

From (2 9 11) of (29 12) it follows that it is sufficient to show 


'(m)) 




'«/J i^(4> ‘(u)) 


(2913) 


Since 2/{/j/(f '(«))</«« /“„/*(x)i/r = * 2/1^2g(C '(a)) 
du, we have 


\u))du-f j(F '(»))* 

Since g((7 “'(«))> 0 and since by Theorem 2 9 3 /(F~'(u))/g(G '(u)) is 
nondecreasing for « > 1 /2 the zero integral implies the existence of a point 
c + ^ c > 0 such that 1 — /(F~'(uy)/g(G '(«)) changes from positive to 
negative as u crosses c + 1/2 Hence f(F ~ '(«)) - g(G ~ '(«)) < 0 or > 0 as 
w < c+ 1/2 or > u+ 1/2 
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Recall from the proof of Theorem 2-9.4 that “'(«)) is nondecreas- 

ing for M > 1/2. Hence, from (2.9.13), and noting where the sign change 
occurs for the integral, we have 


r' 1 


[f{F-\u))-g{G-'{u))]du 


■f. 


l/2 + c 


1/2 ^(4.-'(c-Fl/2)) 




Ji/2-r.4,(4)-‘(c + l/2)) ^ ^ 

But the integral of the difference is 0 as noted in (2.9.14) and the proof is 
complete. 

Hence, as the tails become heavier, the efficiency of T relative to NS 
increases. At the normal distribution e^(T, NS) = 0.955. Hence 0.955 
< ep{T,NS) < 1.91 for any F such that $<F. Exercise 2.10.34 provides a 
similar result for ep(S,T). A more general result for scores functions is 
given by Gastwirth (1970a). 

We now present a result that shows that some Winsorization of the score 
function is desirable. We consider the symmetric contamination model 
introduced by Huber (1964) and discussed by Huber (1981, Chapter 4). 

Let G in be a specified distribution with a strongly unimodal density. 
Definition 2.9.2. Let 0 < e < 1 be fixed and define 

n(£) = (F= (1 - €)G + eH :H & ^,,I{h) < oo) 

where l{h) is Fisher’s information, Definition 2.9.1. We think of G as the 
assumed model and some F in fi(e) as the true model. Note G may be the 
true model since it is also in S2(e). 

In Theorem 2.9.7 we will show that a Winsorized version of the AMPRT 
corresponding to G will have ipaximin asymptotic power over f2(e). Hence 
this rank test will maximize the minimum of asymptotic local power over 
the contaminated distributions. The maximin test statistic will be generated 
by a scores-generating function (2.9.5), that corresponds to a least 
favorable distribution Fq in fi(e). 
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If ^0 IS the Hodges-Lchmann estimate corresponding to ) then we 
will show that it has minimax asymptotic variance over fi(t) It will thus be 
seen that the robust procedures arc generated by a Winsorized scores 
generating function that is similar to the Huber function generating 
robust M estimates 

In general let c(^ F) denote the efficacy at the distnbution f of a 
test based on y generated by ^ ) We seek a generating function ) 
such that 


rntfC* f ) < f ) (2 9 15) 

In this case we say that \q corresponding to ^o( ) maximin efficacy 
over (K«) Thus when the true model is in tl(<) is the least adversely 
affected by contamination in the sense defined by (2 9 15) 

In the next theorem we construct a least favorable distnbution Fo m n(c) 
in the sense that the asymptotic efficacy of the best test (AMFRT) is a 
minimum at Tq Example 2 94 suggests that the least favorable distnbution 
will look like g m the middle and have exponential tails This will produce 
the heaviest tailed strongly unimodal disinbuiion that is hardest to distin 
guish from the assumed model G 


Theopem 2,96, Let ^^>0 and /l>0 be constants such that -g(Xo) 
/«(»„) - and (1 - 0 ' - 2C(a!„) - I + 2j(x,)/t Let 


/oW- 


- ’)SM 

[(' - <)«(*o)'>'p( -*(5: - x„)) 


if X < - Xq 

If -Xo<Jc<Xo (29 16) 

if Xq< X 


Then 

a. /o(x) IS a density function, its cdt is in R(c) and 
b c(^o > t'(^o ^o) where ^ ) is the optimal scores function de 
fined m (2 9 5) by Fg 

Proof a Consider first the mtcgral of /X-*) 

//o(*)*- (' - >)*(-Jte)/‘ + (l - <)[C(«.) - C(-»„)] 

+ (i-*)sW/‘ 

- (I - 0[2C(,,)- I] + (I - <)2g(x„)/k - 1 
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Now define Hq^x) by fo(.x) = (1 — ^o(^) ~ [/o(^) ~ 

(1 — e)g(^)]/e- Hence jhQ(x)dx == 1. We next show h^^x) > 0, for all x. 
First note that 


^ Ux) - 0 - f-) g{x) 


(1 - f)[g(-^o)exp(/:(x + Xo)} -gW] 
0 

(1 - ^)[ g(J^o)exp( - A:(x - Xq)} - g(x)] 


if X < — Xq 
if — Xq < X < Xq 
if Xq < X. 


For X < — Xq, we first show g(— Xo)exp{A:(x + x,,)} — g(x) > 0. Since 
— log g(x) is convex, it lies above its tangent at — Xq; hence — log g(x) 
> -log g(- + (-g'i-Xo)/g(-Xo))(x + Xo), for x < -Xq. Further, by 
symmetry, -g'(-^o)/g(-^o) = Hence 


-log g(x) > -log g(-xo) - k{x + Xo) 


and 


g{x) < g(-Xo)exp(A:(x + Xq)). 

This establishes the first line in dtQ^x); the second follows in a similar way. 
Hence /jo(x) > 0, and Fq is in ^(e). 

b. Note from (2.9.3) the denominator of c(<j>o,F) is free of F. Hence we 
will consider the numerator: 


/(F) = f” 4>o{2F(x) - l)f{x)dx. (2.9.17) 

J—00 

From the definition of %(^u) in (2.9.5), using /o(x) in (2.9.16), we have 
<j>o(2F(x) - 1) = -fo(x)//o(x) 


— k 

-g’(^)/g(x) 

k 


X < — Xq 

-Xo<x<Xo (2.9.18) 

X < Xq. 


Hence <f)o(2F(x) — 1) = 0 for |x| > Xq, and so 

J{F) = 2 r"Vo(2F(x) - \)j\x)dx 

JQ 


~ - 1)/(F ‘(m))/h. (2.9.19) 



120 ONE-SAMPLE MODEL wmi SYMMETSIC, CONTINUOUS DISTRIBUTION 

We now show /(F-'(w)) > for 1/2 <« < F^x^) For 0 < x 

< Xq, 

/o(x) = {1 - <)i;(x) 

<(t-<)«W+‘A(x)-/(x) (2920) 

Hence Fo(x) < F(x) and F„(F~'(u)) < F(F-'(u)) = u for I/2<w< 
Fo(xo) Taking Fq"' of both sides yields 

0< F- •{•.)< (2921) 

for 1/2 < u < Fofxo) Now, since — logg(x) is convex, so is — Iog/o(x) 
Further, -/o{x)//o(x) > 0 implies that -log/o(x) is nondecreasing, 
log/o(x) nonincrcasing, and finally fJx) is nonmcrcasing Hence, by 
(2 9 20) and (2 9 21),/{F- '{«)) > MF V)) > UF^ \u)) 

The inequalities J(F *(w)) > /^Fo'V")) “wd F^Xg) < F(xo), along with 
(2 9 19), imply that J(F) > F(Fo) This implies c«>o. F) > Fq), and the 
proof IS complete 

We are now ready to show that constructed from //x) in (2 9 16) 
produces the maximm test with property (2 9 15) 

Theorem 2 9 T. Suppose C € Q, has a strongly unimodal density and 
F 6 ft(<)_ Let <>o be the score function constructed from /o(x) m (2 9 16), 
and let Vq be the test generated by F) denote the efficacy of the 

test generated by tj) Then 

{I - 4.(Z. - 9c(<.„ f ))) > rar(l - ^(Z. - ))) 

for any other ^ The expression I — 4*(Z, — ffc(<f,F)) is the asymptotic 
local power along the sequence of alternatives = 6/n'^^ of an asymptoti- 
cally size a test See Theorem 2 6 1 

Proof From the last theorem c( 0 q.F) > c(^, Fq), and hence 

From Theorem 2 9 1 we have cWy F) > c(^F) This holds even if is not 
a score generating function since c^<y.F) >= /(/) Hence 

'^{'f-o.Fg) > mfc(*/.F) >mrc(^.,F) 
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Combining these inequalities we have 

inf c(<#)o,i^) > infc(<|>,F). (2.9.22) 


Hence 

1 - - dinfc(^,F)j < 1 - - 0mf c(4)o,F)). 


The result now follows from the fact that is an increasing function. 


Let 9^ be the Hodges-Lehmann estimate of 9 based on V generated by 
<j). Let a\^,F) denote the asymptotic variance where n'^\9^ -J) is asymp- 
totically n(0,CT^((|),F)). Recall that a\^,F) = \/c\^,F). Let 9o correspond 
to <J>o specified in Theorem 2.9.6. Then (2.9.22) implies 


sup(7^(</)o,/^) <supa^((J),F). (2.9.23) 

«(€) fi(t) 

Thus, the asymptotic variance of minimizes the maximum asymptotic 
variance over fi(€). Jaeckel (1971) developed the result, (2.9.23), when he 
showed that there exist R and L estimates that solve Huber’s minimax 
problem for fl(e). Huber (1981, p. 97) points out that the minimax result, 
(2.9.23), cannot always be extended, for rank estimates, to more general 
types of neighborhoods than S2(e). Sacks and Ylvisaker (1982) provide an 
example. See Collins (1983) also. 

We next discuss the form of the generating function ^q{u) corresponding 
to an assumed strongly unimodal G. Let (pg{u) be the generating function 
constructed from g, in (2.9.5). Note, from Theorem 2.9.6, — g'(-^o)/g(^o) 
= k. Define Ug such that ^giug)= - g'{G-\{ug+ \)/l])/ g{G-\{ug+ 1) 
/2]) = k. Then from (2.9.18) we have, for 0 < w < 1, 


M u) = min{,j>g(u),k}. (2.9.24) , 

Thus we see that the optimal score function <|>o is a Winsorized version of 
^j(iO. The amount of Winsorization depends on the amount of contamina- 
tion. 

If the assumed model is normal, then a Winsorized normal scores 
function is best. Winsorization will result in a bounded influence function 
(Theorem 2.8.4) and avoid the peculiarity of Example 2.8.6 in which the 
normal scores estimate has an unbounded influence function See Huber 
(1981. p. 99). 
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Example 2SS. We illustrate the calculations involved in the minimax 
asymptotic vanance for the contaminated logistic model The Wmsorued 
Wilcoxon statistic is best in this case We have treated vanous aspects of 
the Winsonzed Wilcoxon m Examples 282.28 4, 28 5, and 2 9 1 We let 
g(x) denote the logistic pdf, g(x) “ «“*/(• + — oo < jc < co The 

cdf IS given by C(x) •= 1 /(I + e“*), — oo < jr < co, and the relationship 
g(x)« i7(xXl - C(x)) simplifies many calculations Note that -g'(x)/ 
g(x) = 2G(x)~ 1, hence the conditions in Theorem 2 9 6 become 




-2C(jr4)-l 


k 


and 




-k 


2Jc 


(29 25) 


From (2924) we have <fi| 3 (u)-min{tt,Jlc} and - 1 - y m Example 282 
Since 0 < y < I, A: < 1, solving (2 9 25) for k, we take the root 


Ar-(1- 


(2 926) 


The efficacy is given m Example 2 9 I, so we can compute o^(^o, F^) from 
1/c^ First note that 


(2-3y' + y>) 
24 


where x(l - y/2) = C '(I — y/2) Hence, using the efficacy equation in 
Example 2 9 1, after some algebra. 
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Table 2.6. 1 — y and <r^(^o»Fo) as Functions of e 






c 



Function 

.05 

.07 

.10 

.12 

.15 

.20 

1 - y 

.72 

.69 

.63 


.56 

.50 

oH4>o,Po) 

3.00 



■■ 




In Table 2.6 we show 1 — y and o^(^o> ^o) a function of e. This suggests 
that a mixture of roughly 2/3 Wilcoxon scores and 1/3 sign scores should 
provide adequate protection against mild to moderate amounts of contami- 
nation. Table 2.5 compares the Wilcoxon (best at the assumed model) to 
the Winsonzed Wilcoxon [minimax over 0(e)]. 


2.10. EXERCISES 

2.10.1. Under the null hypothesis Hoi 9 == 0, F show the mean and 
variance of T, the Wilcoxon signed rank statistic, are ET~ 
n(n + 1)/ 4 and Var T = n(n + l)(2n + l)/24, where n is the sam- 
ple size. 

2.10.2. Suppose AT,, . . . , X„ are i.i.d. E e 0^. 

a. Show that ^(^„ and ~ X„) have the 

same distribution. Hint: Show that P{g{X^, . . . , X ) < t) 
= P{g{-X„...,-X„)<t). 

b. Show that if g{X„ • • • , 2r„) + g(- 2r„ . . . , - 2r„) = pp then 
g(A:„ . . . , X„) is symmetrically distributed about po/a. Hint: 
Show P(g(2r„ ...,X„)< Mo/2 - 0 = P{g{X^, ...,X\ 

> Mo/2 + /). 

c. Apply (b) to the Wilcoxon signed rank statistic to show that 
T has a symmetric distribution under the null hypothesis. 
What is the point of symmetry? 

2.103. In this exercise we consider conditions on the marginal distribu- 
tions of a pair of possibly dependent random variables (T, C) so 

that the difference T — C will be symmetrically distributed. 
Prove: 

a. If the marginal distributions are identical (but not necessarily 
symmetric) then T - C is symmetrically distributed. • 

b. If the marginal distnbutions are symmetric (but may have 
different scale parameters) the distribution of T — C is sym- 
metric. 
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2 10.4. Let I'j =^a,fV, and where IF,, . , tl'„ are iid 

B(,\, 1/2) Show that 

Use Theorem A9 and prove that, for the VVilcoxon signed rank 
statulic r, (2 7 2) under //,j. 


r-«(/H-l)/4 
l)(2fl+ l)/24 


4z~n(0,l) 


Further, find limp,<S,TX where p.(S T) >s the correlation be- 
tween the sign statistic S and T Finally, using Theorem A13, 
show (hat S and T when properly standardized, have an asymp- 
totic bivanate normal dmnbution Hint Define T* * (l/(n + 1) 
n'/'lSylK', - 1/2) and S- ~ 1/2) 

2 10.5 Show that the tVilcoxon signed rank test has tolerances given by 


T,(accept) « 


2/1- I 



T^rcject) = 


2/1 - I - 

2n 


Use the approximation (2 2 9) for C„ , the critical value of the test, 
to show that both tolerances converge to t = I — 1/2''”^ the 
estimation tolerance of the median of the Walsh averages in 
Example 2 4 I 

2.10.6 Prove Theorem 242 for #B = 2q-¥ 1, odd 

2107. Define fl= {(t.;) Then fl = med(X', + 

in 5 IS called Calton’s «timate by Hodges (1967) Find 
the tolerance of 6 
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2 . 10 . 8 . In (2.5.1), show that = {p\ + P'^/1. 

2.10.9. Compute /?„ p^, and p^ in (2.5.1) for a uniform distribution on 
(—1,2). Approximate the power of the sign test and Wilcoxon 
signed rank test in Example 2.5.5, for n = 40 and a = .05. Use the 
Central Limit Theorem to make the corresponding approxima- 
tion for the t test. 

2 . 10 . 10 . Show the asymptotic power, for a sequence of alternatives 0„ 

= of the sign test is 1 — ^(Z„ — 2f(0)a). Compare this to 

(2.5.27), for the corresponding asymptotic power of the Wilcoxon 
signed rank test, for f(x) the n(0, 1) density function. 

2 . 10 . 11 . Show that the sample size equation, (2.6.3), for the sign test holds 
in Example 2.6.1. 

2 . 10 . 12 . In this exercise we outline a proof that the Wilcoxon signed rank 

statistic, when properly standardized, is asymptotically normal, 
uniformly near 0 = 0, provided the variance is bounded away 
from 0. Suppose 2^,, . . . , An is a sample from F(x — 0), F G 
and suppose a\ 0 ) = p^{ 0 ) — p\(, 0 ) where p^{ 0 ) and p 2 { 6 ) are 
defined in (2.5.1). Further, suppose there exists a constant K such 
that 0 < < a\0) for all 0 in a neighborhood of 0. (This will be 

true if a^(0) > 0 and a\0) is continuous.) Define 


r = 


Ky 


n{n + 1) ‘J 


T* = 




«(« + 1) ,<y 


with Ty given by (2.5.3). 

a. Show that in a neighborhood of 0 = 0, 


a{0) ?(0) ^/:(/7+l)' 

Henc^r and T* have the same asymptotic behavior; how- 
ever, T* is easier to deal with, 
b. Define 


V* = 


a(^ 0 )n{n — 1) 


><j 
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and show EV * «» 0 and 


Var V* 


, n-2 

to\6)(n - 1) n - I ■ 


c N«at let y, - 1 “ «(- jr,). H(x) » F{x - 6), and show that 
the projection of V* is 

Further, show £1^ = 0 and Var V* = I 
d Let £ = K* — F* Then for < > 0 show 

< p(V < i) < p{y* </ + <) + £(|/?1 > i) 


e Now show, for some constant C and specified S >0, 




for sufficiently large n 
f {fence conclude (hat for any c > 0, 

9(t - t) - 2« - ^r) < £( y* < r) - «(») 

< <P{l + e) + 2S - *(0 

Now, since <!'( ) is continuous, as e-»0 we have 
|£(F*</)-4^<)|<26 

for sufficiently large n, umfortaly in 6 near 0 
2.1013. Show that the efficiency equations (2 6 16)-(2 6 18) are scale 
mvanant Hint Let ^(jc) = T~y'(T~';c) and show the equations 
do not depend on r 
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2.10.14. a. Compute e{T,t), (2.6.18), when /(x) = 

— 00 < X < 00 , the logistic distribution; when f{x) = e“W/2, 

- 00 < X < 00 , the Laplace or double exponential distribu- 
tion; and/(x) = 2"', - 1 < x < 1, the uniform distribution. 

b. What happens to e{T,t) if /(x) = l/77-(l -1- x^), — oo<x 
< 00 , the Cauchy distribution? 

2.10.15. Let /(x;t) = re “ ^72^^ “ *), -oo < x < oo. The parameter t 
determines a family of distributions. For example a normal 
distribution corresponds to t = 2 and a double exponential distri- 
bution corresponds to t = 1 . 

a. Find e^{S, T), (2.6.16), as a function of t. 

b. Graph e^(5, T) as a function of t and find the point where 
e^{S,T)= 1. This indicates how “far” from normality we 
need to go before S is more efficient than T. 

2.10.16. Find e{S,t), (2.6.17), for the normal and double exponential 
distributions. 

2.10.17. Suppose is a (1 — a) 100% confidence interval derived 

from the Wilcoxon signed rank statistic T. If Oq is the true 
parameter value, show that — ^o) is asymptotically 

1 /c^) where Z „/2 is the upper a/2 percentile of the 
standard normal distribution and c\— 12(//^(x)dx)^. Hint; Use 
the argument of Theorem 2.6.5. 

2.10.18. Argue that equations for S and t similar to (2.7.8) can be derived. 

In other words, is approximately (in probabil- 

ity) equal to 1 f2f(0) and oy for S and f, respectively. In particu- 
lar, prove that if F G SIq, /(O) < oo, then 

lim Po| sup |Vn ) - 5(0) j + bf{0)\ > e| = 0, 

where 5= ^“'^^(A',). 

2.10.19. Apply the result in Exercise 2.10.2 to show that V, defined in 

(2.8.1) has a symmetric distribution under the null hypothesis 
when we are sampling from a population symmetric about 0. 
What IS the point of symmetry for the distribution of F? 

2.10.20. Under the null hypothesis, show that F, and Fj, defined by 

(2.8.1) , have an asymptotic bivariate normal distribution when 
properly standardized. What is the asymptotic variance- 
covariance matrix? Hint: Use Theorem A13 in the Appendbc. 
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21021 Show that the normal scores generating function in Example 
2 8 I satisfies the conditions on ^u) in Definition 2 8 2 by 
showing that “ (2/v)'^* and = 1 

2 10 22 Show for the Winsorucd signed rank statistic in Example 2 8 2 
that 

n Var F-^(l - y)*(I +2y)/12 

2 10 23 Construct a table of EV and n Var V for vanous values of n that 
illustrates the rate of convergence to the asymptotic parameters m 
Exercise 2 1022 

2 10 24 Compute the exact and asymptotic mean and vanance under the 
null hypothesis of the modified sign statistic in Example 2 8 3 
and discuss the asymptotic distribution Find the asymptotic 
testing tolerances 

210 25 a Develop the Kodges-Lehmann estimate from the modified 
sign statistic of Example 2 83 Show that it is possible to 
compute (he estimate without constructing the Walsh aver 
ages 

b Write the formulas for the endpoints of the confidence inter 
val derived from (he modified sign statistic 
c Find (he asymptotic estimation tolerance as a function of y 

2 10 26 Suppose (he score function ^ ) is bounded and <>(0) = 0 Sup 
pose also that / (a) exists Show that c in (2 9 3) can be written as 





210 27 a Use Exercise 2 1026 to find the efficacy of the modified sign 
statistic 

b When / is the «(0 1) pdf find the value of y that maximizes 
the efficacy Compare the efficiency of this test relative to the 
t test with that of the regular sign lest relative to the t test 
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2.10.28. Recent studies show that hypertension drugs such as propranolol 
may alleviate the symptoms of stage fright (Time Magazine, July 
5, 1982, p. 58). To test this hypothesis, 29 professionals and 
students gave two solo recitals before an audience of critics and 
faculty members. Ninety minutes before the recitals they were 
given either propranolol or a placebo. Heatbeat rate was mea- 
sured by remote electrocardiogram monitoring during the perfor- 
mance. Normal resting heartbeat rate is 70 beats per minute. 
Artificial data on 8 performers is as follows: 


Performance 

Treatment 1 2 3 4 5 6 7 8 

Drug 85 107 69 122 106 121 137 87 

Placebo 126 140 95 148 142 172 133 143 


Let 0 denote the median of the distribution of differences, place- 
bo - drug. Use the Wilcoxon signed rank procedures to test the 
hypothesis HqiO = 0 versus H^:9 > 0 at a = .05, construct a 
point estimate of 0, and construct an approximate 95% confi- 
dence interval for 9. 

2.10.29. Let be as given by (2.1.1) and let in be the subclass of 
unimodal symmetric distributions. Find the following infimums: 

infe(5', T) and infe(5', T). 


2.1030. Show the asymptotic testing tolerance for Vj generated by ^Au) 
in (2.9.5) is given by e such that/(F“'(l - e/2)) =/(0)/2. 

2.1031. Show the influence curve for 9j, the estimate corresponding to Vf 
generated by <#y(M) in (2.9.5) is 


fl(y) = 


ny) 

Kf)f{y) 


where 1(f) is Fisher’s information. Definition 2.9.1. 

2.1032. Find ^f(it), (2.9.5), for logistic and double exponential distribu- 
tions. 
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2 10J3. 

2.1034. 

2.1035. 


Let F^.(x) »» 9{x) it |x| < c and - t)) if |xl > < Use 

this distnbulion to show that the upper bound in Theorem 2 9 4 is 
sharp 

Suppose F,G G ft, and F<G Prove that ^r{S, T) < T) 
Suppose A’n.Vj. . ,Ar„ are independent with Xi^F(x — 6i), 
FeQ, For testing //fl. =» • •= versus 
with at least one stncl inequality, we consider Mann’s (1945) lest 
for trend S* ~ •*’.) slalislic S* compares X/ to 

all observations that come later, and hence makes more compan- 
sons than the Cox -Stuart lest in Exercise 1 8 13 
a Under it is easy to see that ES* = n(n — I)/4 A rather 
intricate counting argument, similar to the development of 
the variance of the Wilcoxon signed rank statistic in Theorem 
2 51, can be used to show that under Hq, Var5* “ n{n — I) 
(2n + 5)/72 Use the Projection Theorem 2 5 2 to show that 
S*, when properly standardized, is asymptotically normally 
distributed under then construct the approximate cntical 
value for a size a test 


b 


Suppose (X(, Ti), , Y„) is a bivanate random sample 

from F(x.y), an absolutely continuous disinbution Two 
observations (X„Y,) and {Xj,Yj) are concordant it both 
members of one pair exceed the corresponding members of 


the other pair Concordance can be expressed as {X, - Xj) 
{Y,-Y)>0 If (X, - XjY. Y, - rp < 0 the observations arc 
called discordant Let P and Q be the number of concordant 


and discordant pain and define 


^.. 2(P-g) 

n(n-l) • 


called Kendall’s r Show that — 1 < t < 1 For testing Hq 
(X, Y) independent, reject if |t| > X; Find the mean and 
vanance of r under H„ Prove that (t- Er)/(VaiT)'^^ is 
asymptotically n(0, 1) under //q and construct the approxi- 
mate size a cntical value (See Theorem 4 4 2 for an alterna- 
tive approach ) 

2 1036 Suppose A’,, , X„ 1 1 d F{x — 9), F G fto, with /(O) < oo and 

= fx^f(x)<ix < (a To test Hq F G ft, versus FGfto" 
ft,, Gastwirth (1971) proposed the sign test for symmetry Reject 
f/o if ISfA") - n/2| > k where = SiCA", - X) 
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a. Let S(X) = n~ ^S{X). Under prove that 

V^(5(J)- 1/2)^Z 

~«(0, 1/4 + 02/2(0) - f{0)j"jx\f{x)dx^. 

Outline: (1) Without loss of_generality let 0_= 0 and use 
Exercise 2.10.18 to write n'^\S — 1/2) = n‘/2(5'(0) — 1/2) — 
«'/y(0)Z + o^(l), (2) Use Theorem All in the Appendix to 
show that 



with 0,1 = 1/4, a,2=f(0)foxf{x)dx, 0 ^ 2 = f\0)a^, (3) Use 
Theorem A2(b) for vectors. 

b. Part (a) shows the test based on S'(A') is not distribution free 
under Hq. Suppose the naive user constructs a nominal 5% 
test by rejecting Hq if [5(2^) — n/2| >_\.96n^^^/2. Hence the 
critical region is constructed as if S{X) is roughly B{n, 1/2). 
Use part (a) to approximate the true level of the test if the 
underlying distribution is the normal, the double exponential 
and the symmetric Pareto distribution with f{x) = 3 /2(1 + 
1 x 1 )^ — 00 < X < 00. 

2.1037. Let X,, . . . , X„ be i.i.d. F(x — 9), F e Split the sample of n 
observations into p groups each of size a, a fixed. Let 
and reject Hq: 9 = 0 for : 9 > 0 if > k. Feustal and Davis- 
son (1967) call a mixed Wilcoxon test. 

a. Prove that (r„ — Fr„)/(Var r„)'/2 is asymptotically n(0, 1); 
hence k can be approximated from the normal table. 

b. It takes on the order of operations to compute T whereas 
it only takes on the order of pa^ for r„. For large n, the 
savings may be significant. In order to assess the efficiency 
loss due to grouping, find the efficiency e„ of T„ relative to T, 
and plot as a function of a. 

c. Find and interpret the lim„_»,e„. 



CHAPTER 3 


The Two-Sample Location Model 


3 1 INTRODUCTION 

In this chapter we consider the problem of comparing two populations 
Methods based on ranks are developed for testing and estimation of 
differences m location when the two populations have the same, but 
arbitrary shape The mam emphasis is on finite samples rather than 
asymptotic results We discuss the asymptotic distnbutions and descnbe the 
extensions of efficiency from one* to two sample test comparisons We do 
not discuss robustness aspects of the two*sample procedures since they are 
similar to the one*sample results Discussion can be found m the work of 
Ricder (1982) and his references and in the work of Lambert (1982) and 
her references 

The sampling model is defined by random samples Xf, X„ from 
F(x — S,) and Ki, Y„ from F{x — 9^) F e Rq Hence, F is not as- 
sumed to be symmetric but it does provide the same shape for the two 
populations Next let ^=9 —9^ the difference m the population me- 
dians We wish to test //q A = 0 versus A > 0 and to construct point 
and interval estimates of Without loss of generality we may take 9^ = 0, 
hence the sampled populations can be wntten F(;c) and F{y — A) respec 
lively The difference m locations A not the locations themselves, are 
considered 

Example 311. The randomization model provides an example We begin 
with N subjects who are randomly assigned to a treatment group or a 
control group Suppose X represents the control measurement and Y the 
treatment measurement If the treatment acts to add a nonnegative con 
slant A to the control effect then we wish to test //q A ■= 0 versus 
A > 0 Because the S subjects originally came from a single popula- 
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tion, the foregoing model specifying a common shape under is appropri- 
ate. The treatment may act to alter the distribution in some way other than 
a location change. Whether the location test is sensitive to the other 
changes is an issue discussed under consistency. See Example 3.5.1 for a 
discussion of the Mann-Whitney-Wilcoxon test which is introduced in the 
next section. 

Finally we note that, in a comparison of attributes where randomization 
is impossible, the assumption of a common distribution shape is not 
automatically satisfied under Hq. For example in companng the grade 
point averages of two schools we cannot assign subjects at random to the 
schools. The next example also illustrates this point. A Behrens-Fisher rank 
test is introduced in Section 3.5. 

Example 3.1.2. Salk (1973) describes a study that analyzes the soothing 
effect of the mother’s heartbeat on her newborn infant. The infants were 
placed in a nursery immediately after birth and they remained there for 
four days except for normal feedings by their mothers. The treatment group 
(n = 102) was continuously exposed to the sound of an adult’s heartbeat 
(72 beats per minute at 85 decibels). The control group (m =112) consisted 
of another group of infants in the same nursery. Hence it appears that the 
babies were not assigned at random to treatment or control. 

One of the measurements consisted of the weight change in grams from 
the day following birth to the fourth day. It was hypothesized that the 
heartbeat group would gain more weight than the control group since the 
heartbeat group was expected to cry less. Let X(Y) denote the control 
(treatment) measurement. Suppose A',, . . . , and F,, . . . , 7„ are sam- 
ples from F{x) and F(y - A), F E. Aq- Then we wish to test //q ; A = 0 
versus //^ : A > 0. Note the similar shape assumption is an integral part of 
the model and is not automatically satisfied under Hq as it would be for a 
randomized experiment. By monitoring the nursery, Salk estimated that the 
heartbeat group cried 38% of the time and the control group 60% of the 
time. Moreover, 70% of the heartbeat group, as opposed to 33% of the 
control group, gained weight. The heartbeat group showed a median gain' 
of 40 grams; the control group showed a median loss of 20 grams. In 
Example 3.2.2 we provide some data and illustrate a formal test of 
T/q : A = 0 versus ; A > 0. 
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To lest //(,: A = 0 versus 7/^ ; A > 0, Wilcoxon (1945) proposed the follow- 
ing simple procedure: First rank the combined data from smallest to largest 
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and compute U. the sum of ranks of the Y observations, then reject Hg if U 
IS “too large ” Large values of U indicate a shift of the Y sample to the 
right of the X sample It remains then to find the distnbution (exact or 
approximate) of U under //q A " 0 in order to determine the critical value 
of the test In Section 6 3 this test is extended to the two-sample multivari- 
ate location model 

We first discuss the joint distnbution of the ranks of the Y observations 
in the combined data and then apply the results to U Let Af =• ot + « and 
let R, denote the rank of Y, in the combined data, hence there are /?, 
observations less than or equal to Y„ and N — R/ greater than Y, 

Theorem 3.2.1. Under //^ A — 0, if R,. , R„ denote the ranks of 

Y„ , Y^ in the combined data then 

«(«- 1 ) '' 

lo it j-< 


where A/ » m + ^ and i s* j 

Proof Under //q A ~ 0, the combined data constitute a random sample 
of sue N m-¥ n from F{x\ F Hence every pennutalion of the 
combined sample is equally likely The event f?, » j is determined by the 
{S — I)' sequences of mX'% and itT’s in which Y, is always in the sth 
position Thus 


/’(i?, = r) = 


(N-iy 

Af' 


N 


The P(R, = s,Rj = t) is determined in exactly the same way Since F G (Iq' 
the probability of a tie, that is, P{R, •» j), is zero 


In Exercise 3 7 1 you are asked to show that under IIq A = 0 , ER, 
= (N+l)/2, Var/?, = (//*- 1)/12 and Cov{R„Rj)^ -{N + \)/l2 for 
I ¥= j Hence we have for U = 


(3 2 1) 


EU = n{N-¥ l)/2 
Vart/ = /»iit(fV+ 1)/12 
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A counting form of the statistic can also be given. Recall that R, is the 
number of combined sample items less than or equal to 7,; hence 

R, = #{Xj <Y,) + #{Y^< Y,), j=l, . . . ,m, k ^ I, . . . , n, 
and 

n 

^R = #{Xj<Y,) + n(n+\)/2, j = I, . . . , m, i = I, . . . , n. 

1 

The n(n + l)/2 represents the number of times Y observations are less than 
or equal to other Y observations and can be easily determined by putting 
the T’s in order. Thus, 


n 

U =2^,= ~ > 0) + / = 1, . . . , /i,y = 1, . . . , m, 

1 

and we let W- #(Y, - Xj> 0) so that 

U = W + n(n + \)/2. (3.2.2) 

The two statistics U and W have the same statistical properties; U is the 
rank sum form proposed by Wilcoxon, and W is the counting form 
proposed by Mann and Whitney (1947). This form of the statistic has been 
traced by Kruskal (1957) to the work of Gustav Deuchler published in 
German in 1914. Note also that 


EW= mn/2 
Var W = mn(N + 1)/12. 


(3.2.3) 


Now that we have the mean and variance for both U and W we need the 
distribution under in order to construct tests and confidence 

intervals. We consider the exact distribution first and then use the Projec- 
tion Theorem 2.5.2 to develop the asymptotic distribution. The following 
example is similar to Example 2.2.1 for the one-sample case. 

Example 3.2.1. Take m = n = 2 and suppose the null hypothesis is true so 
that the 2x’s and 2y's are a sample of 4 observations. There are (^ = 6 
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equally likely arrangements which can be listed as follows 


Ranks I 1 i A W U 

Arrangements y y x x 0 'i 

y X y X 1 4 

y X X y 2 5 

X y y X 2 i 

X y X y i 6 

X X y y A 7 


The distnbution of IF (or U) under 11^ A *= 0 can be listed as 


w 0 12 3 4 

/>{IF-h) i j I 1 1 


Note the distribution-free nature of IF i$ illustrated since no assumption on 
F was necessary to determine the probabilities ol IF As in the one-sample 
case the statistic is no longer distribution free under alternatives This 
example shows that the distnbution problem is reduced to counting the 
number of sequences such that IF ■ w The symmetry of the null distnbu 
tion of IF 1$ also illustrated 

Define P„„ik) to be the number of sequences of mx'i and riy’s such that 
IF = A: In the foregoing example PiJ2 ) » 2 A recursive equation can now 
be developed 

Theorem 3 2 2 Given mx's and n/’s, 

“ ^-,,-i(* - m)-»- P„.,„{k) 

where P,j(k) = 0 if A: < 0, P,,/ik) and Pf^ (Ac) arc 1 or 0 as fc = 0 or A: s* 0 

Proof Divide the sequences such that IF=Ac into two groups those 
ending m x and those ending In the former case IF computed on all 
but the last x must equal k, since attaching an x does not change the count 
IF = #(y, > Xf) Hence for these sequences of m — Ia’s and nv's, IF= A: 
In the latter case, IF computed on all but the last/ must hek - m because 
the last /, when attached, will exceed mx’s Hence, for these sequences of 
mx's and n — ^l^’s IF = Ar — m Putting the two cases together yields the 
equation for P„ „ 



3.2. THE ma?w-whitney-wilcoxon rank statistic 

Theorem 3.2.3. Under Ho:A = 0 let P„_„(k) = PfjJi W = k). Then 
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Pn,.n{^) = 


- » - p 


. ,(/: — m) + 


m p 
m + n 




with the boundary conditions given in the previous theorem. 

Proof. Since the ("'„”) sequences of /ma’s and ny's are equally likely under 
//q : A = 0, we have, from Theorem 3.2.2, 


p (Ic)- 


m! nl 


(m + //) 


^ -m) + P„_ 




m + 


n + Ij m + n + n — 




m + n 


m + n 


This equation can be used for computer generation of tables of probabil- 
ities of W (or U). A similar equation for the Wilcoxon signed rank statistic 
IS given in Exercise 3.7.3. 

In the next theorem we use the Projection Theorem 2.5.2 to develop the 
limiting distribution of W under Hq so that critical values can be approxi- 
mated. 

Theorem 3.2.4. Suppose m,n-^oo in such a way that m/N-^X, 0< 
1, where N — m + n. Then {W— EJV)/(VaT and (U — EU)/ < 
(Var (/)U2 have limiting /i(0, 1) distributions under //g ; A = 0. 

Proof Let = 1 if Yj> X, and 0 otherwise, then W = Under 

-”0 ^ ~ 

Next note that £[( 7 ^ - \/l)\X^ = x]= P{Yj > 2 r,jX, = x) - 1/2 and 


^{{T,j 5)l^i ^]“ {p(y >;c)-i 


if 1 
if k = / 
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r .V. -1 fO k^J 

it k-j 

Since X and Y have ihe same distnbutioa F, under //(, we have 

£[ S S(n - - '"Ml - '/2) 

*[ 2 2(2-1 - 1/2)1 n -^] -».(/•(>•) - 1/2) 

The projection of IV* is 

-I j-i 

Consider + *here V, (and 

V) is uniformly distributed ot» (-1/2 1/2) with mean 0 and vanance 
1/12 We can apply the Central Limit Theorem to each of the two terms on 
the nght side For example Z|— ’n(0 1/12) 

Now note that if V*^Z 2 and F, V; are independent, then the 

characienstic function of F, + F* converges to the charactenstic function 
of Z, + Zj Hence 



where Z, Z 2 are independent n(0 1/ 12) and 

^>',5z~«(0i/[ia(i-)i)]) 

Further Var(fV'/*F,/mn)^ 1/112A(1 - A)j From (3 2 3) 

Var(Af = N(N + l)/(l2m»i)-> l/[ 12X(1 - X)] 

From the Projection Theorem 2 5 2 (2 5 7) 

so by Theorem 2 53 N *^^tV*/(nm) has the same normal limiting distnbu 
tion as N'^^y^/(mn) 
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The proof is completed by writing out [W — EW]/(yaTWy^^ and 
applying Slutsky’s Theorem A3 (in the Appendix) along with the limiting 
normality of N'^^W*/(mn). 


Hence < w) = ^(0, where / = (w + 0.5 - mn/2)/[mn{m + n + 
1)/12]'/^. The standardized value t contains a continuity correction. The 
approximation can be improved by using an Edgeworth expansion similar 
to the one presented for the Wilcoxon signed rank statistic in (2.2.11). Fix 
and Hodges (1955) show that 


P{W< w) = $(l) + 


+ mn + m + n 
20mn(in + n + 1) 




where ;/'(•) is the /i(0, 1) pdf. Fix and Hodges present a table comparing the 
normal approximation to the Edgeworth approximation. The simple normal 
approximation, with continuity correction, is adequate for most purposes. 
Bickel (1974) discusses the error of the Edgeworth approximation. 

For testing, it is more convenient to use U = "^IR, since it only requires 
ranking (ordering) m + n observations. The counting form W = — 

> 0) requires the computation of mn differences, which can become un- 
manageable for moderate values of m and n. The counting form is 
important for the estimation of A. 

Note that 1F= #(7, - > 0) / = 1, . . . , n, y = 1 m, is the sign 

statistic computed on the mn differences. Hence, following the discussion of 
Section 1.5, the Hodges-Lehmann estimate of A is 

A = med ( T, — A) ). (3.2.4) 

Furthermore, if < k) = a/2, then since the distribution of W is 

symmetric under : A = 0 from Exercise 3.7.5, we have 

\^(k + \) (3.2.5) 

is a (1 — a) 100% confidence interval for A where < • • • < are 
the ordered differences T, - A), / = 1, . . . , n, y = 1, . . . , m. Using the 
normal approximation, k in (3.2.5) can be approximated (using a continuity 
correction) by 


/c = ^ - 0.5 - Z 


mn(m + n -t- 1) 


■a/2 


12 


(3.2.6) 


where is the upper a/2 percentile of the standard normal distribution. 
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The Minitab slalistical computing system contains commands that compute 
the Mann-Whitney-Wilcoxon test, point estimate, and confidence interval 

Example 3.2,2 We return to the discussion of Example 3 1 2 The babies 
were separated into three groups according to birth weight We consider the 
group of larger babies with birth weight of at least 3510 grams There were 
n =■ 20 babies in the treatment group (T) and m = 36 in the control group 
(AT) The data is given in Table 3 1 The data was reconstructed from a dot 
graph m the article by Salk (1973) 

For testing //q A = 0 versus A > 0, a “ 05 was specified and a 
normal approximation was used to deterinine the critical value c = n(ni 
+ n + l)/i + 05+1 645[mn(m + n + 1)/12|'^^ = 666 7, to use with U *= 
where R^, , are the ranks of the treatment observations in 

the combined sample For the data m Table 3 1 we have V = 762 5 > c, 
hence we reject //q A “ 0 at a = 05 and declare the treatment effect to be 
significant The value of U was determined by assigning the average rank 
to lied observations 

To assess the magnitude of the treatment effect we constructed a point 
and interval estimate of A There arc mn • 720 differences. Y,- X„ j 
-I .20 i-I. .36 and i»med(y^-A’,)-600 From (3 26) 
using Zo/j"l96 fc»2448 (f we use i!.«244, then lO,j 4 j,.Z), 47 j,)" 
(29 9, 100 0) IS a slightly more than 95% confidence interval for A Note how 
far the confidence interval misses 0. reflecting a strong rejection of T/q A 
•B 0 The computations were earned out using the Mann command in 
Mmitab See Ryan et al (19SI) 

It should be noted that ties are generally broken by assigning the average 
rank to all observations in the group of tied observations This corresponds 
to assigning J to F - differences that are zero This is called the midrank 
method and is discussed at some length by Lehmann (1975, Chapter 1, 
Section 4) See also Putter (1955) 

We can also study the behavior of the true significance level of iV on 
serially correlated data Similar to the serial correlation model introduced 
at the end of Section 1 7 and discussed again at the end of Section 2 7, let 
(A",, AT, 4 .,), I = 1,2, , have a bivanate normal distnbution with means 0, 

vanances I, and correlation p. Likewise, define a sequence (F,,y, 4 i), 
1 = 1,2, , independent from the X sequence, for which the bivanate 

normal distnbution has means 0, vanances 1 and correlation p^ The 
results of Serflmg (1968b) show that the projection V given in the proof of 
Theorem3 2 4, sliTl determines (heasymptotic distribution of iV In particu- 
lar, using Theorem A16, converges in distribution to 

(1/X)'^^Z, + [1/(1 — X)l’^^Z 2 where Z, and Zj are independent normal 
random variables with distributions «(0 (1/12) + (l/w)sin"'(p,/2)) and 



Table 3.1. Weight Gains for the 
Large Babies 


Treatment 
(n = 20) 

Control 
(m = 36) 

190. 

140. 

80. 

100. 

80. 

100. 

75. 

70. 

50. 

25. 

40. 

20. 

30. 

10. 

20. 

0. 

20. 

-10. 

10. 

-10. 

10. 

-25. 

10. 

-25. 

0. 

-25. 

0. 

-30. 

-10. 

-30. 

-25. 

-30. 

-30. 

-45. 

-45. 

-45. 

-60. 

-45. 

-85. 

-50. 


-50. 


-50. 


-60. 


-75. 


-75. 


-85. 


-85. 


-100. 


-110. 


-130. 


-130. 


-155. 


-155. 


-180. 


-240. 


-290. 
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«(0,(1/12) + {l/ir)sin"'(p^/2)), ftspcctivcly See (2711) Then, as m The- 
orem 324, (N'^^/mn)(lV- ElV) is asymptotically /»(0.[12A(1 + 

(X;r)"'sm"'(p_,/2) + [(1 - A)ffr'sin“'(p,/2)) Hence, as m the case of the 
Wilcoxon signed rank statistic T, the Mann- Whitney- Wilcoxoti statistic ll' 
IS no longer distnbution free, even asymptotically, in the presence of senal 
correlation The true level of W is affected in the same way as that of T 


3J THE DISTRIBUTION OF RANKS UNDER ALTERNATIVES 


In this section we consider the problem of computing the joint probabilities 
of the ranks in general Although the results can be used for computing 
exact power of rank tests, our mam goat is the construction of locally most 
powerful rank tests for a given underlying model These locally optimal 
tests can then be compared to two-sample versions of the asymptotically 
most power rank tests discussed after Theorem 2 9 1 for the one-sample 
rank tests 

Let AT,, , Xm tmd T,, , Y, be random samples from arbitrary, 

absolutely continuous distnbutions denoted by G(x) and //(y). respec 
tively with densities g(x) and A(y) Let R,,, < < denote the 

ranks of I'd) < < T,,, in the combined data The following result is 

due to Hoeffding (1951) 


Theorem 3J 1. Suppose A(x) > 0 implies g(x> > 0 Then 


'•Wi, ■ 






^ n 


s( 


where y,„ < < are ihe order statistics of a sample of size 

m + n from G 

Proof We first consider the event R(i, = r|. given y,,) 

< <y((i) Then there are Pi — 1 x’s less than and rj — r, — 1 x’s 

between y,,, and y,!, Let rj = 0 and /•„+, = m + n -P I The conditional 
distribution can be written as a multinomial probability by thinking of 
tossing the mx’s into the n -!■ 1 cells defined by the fixed, ordered y’s Hence 

^(^(ii = ''i’ > ^(-) =■ ''nlyci) < '^y(jt») 

nh.i 

J-0 
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where, for example, the probability that an x falls into the second cell is 
< X < j( 2 )) = G(>'( 2 )) - <^(7(i))- Now multiply by «! n^(7(o)’ 
joint density of 7( ,)<••• < 7(„), and integrate with respect to /(,), 
. . . , to get the marginal distribution of < • • • < R(n)- 
following we also multiply and divide by (m + «)! H g(7(/))- 

• • • > ^(n) ~ O 

= f r _mU^ n^(7(i)) 

J •■•J {m + n)\ n^(7(.)) 


X 


(m + «)! 








X [1 - ""ng(7(o)j ^7(1) • • • 


Recognize the function in { } as the joint marginal density of < 


< from K,n < 


( 1 ) 


< V, 


(m + n) ■ 


see Wilks (1962, p. 237). Hence the 


integral can be rewritten as the expectation stated in the theorem. 


This result has been used for the calculation of probabilities in various 
special cases. For most distributions of interest, the expression is intracta- 
ble. See Lehmann (1953) or Hayman and Govindarajulu (1966). We use the 
equation to develop locally optimal rank tests in the location model. The 
corresponding result for the one sample model is given in Exercise 3.7.18. 
In Exercise 3.7.20 locally optimal rank tests in the scale model are dis- 
cussed. 

Restrict attention to the location model with G(x) = F(x) and H(y) 
~ F{y — A), FGAg. Under HqiA^O, the distribution of < • • • 

< is uniform over the ('”;^") equally likely sequences. This can be seen 
immediately from Theorem 3.3.1 with g- h. Hence, if k is an integer such 
that A:/("'^") = a, then any set C of A: rank vectors (r,, . . . , r„) is a size a - 
critical region for //q : A = 0. Our problem is to determine the best critical 
region under some stated criterion. 

The power of a size a critical region C is given by 


('■i Oec 




E 


n 




(3.3.1) 


where < • • • < are order statistics of a sample of size m -t- ?i 

from F{x). 
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Definition 33 1 The locally most powerful size a rank test (LMPRT) is 
given by the size a cntical region C* such that /3 (0) is a maximum Here 
p (0) IS the denvative of p(L) evaluated at d = 0 

This definition differs somewhat from the usual definition of a LMPRT, 
see Lehmann (1959) The more stringent definition requires the rank test to 
be uniformly most powerful among all rank tests in a sufficiently small 
neighborhood of A » 0 In our defmilion we only require the slope of the 
power function be maximized at A » 0 hence our LMPRT has the most 
rapidly increasing power function at the null hypothesis Unless the situa 
tion IS pathological by continuity of the power function the LMPRT of 
Definition 3 3 1 will be optimal in a neighborhood In the next theorem we 
suppose that F has a differentiable density and denvatives with respect to 
A may be passed through expectations 

Theorem 3J 2 Given /'€Qo suppose differentiation under the expecta 
tion IS valid then the LMPRT rejects //* A*» 0 in favor of //^ A > 0 if 







> c 


where c is determined by > c) ■ a 

Proof From (3 3 I) differentiating under the expectation yields 








n/( 


and 



where C is any set of k rank vectors {/■, r„) such that = a 

To maximize /? (0) we build up C by including those rank vectors 
(r, rj which yield the k largest values of -2^{/(^( >)/ 
Hence we can Imd a constant c such that K > c yields the size a 
critical region that maximizes p (0) and this completes the proof 
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Suppose we generate a score a(i) by defining 

w. 


(3.3.3) 


where < • • • < V^m+n) ^^e the order statistics from F{x). Then the 
LMPRT is provided by the statistic V — where R^, . . . , R„ are 

the ranks of the Y observations in the combined data. We now consider 
three examples and then turn to a discussion of general score statistics for 
the two sample location model. 


Example 3.3.1. Suppose /( a) is the standard normal density; hence 
/fix) = X. The LMPRT is determined by F = "^"aiR,), called the normal 
scores statistic, where a(i) = E(F^,y) and F^,) < • • • < are the 

m + n order statistics from the n(0, 1) distribution. For selected sample sizes 
there are tables of expected values of normal order statistics available for 
use in the computation of F on specific data; see Fisher and Yates (1938), 
who initially proposed the test. A natural approximation to jF(F(,)) is 
^~\i/im + n + 1)) where $(■) is the standard normal distribution func- 
tion. Hence an approximate normal scores statistic is given by V* = 
2]'i^~'[-l?,/(A + 1)], first proposed by van der Waerden (1952). Compare 
V* to the one-sample statistic in Example 2.8.1. Because of the smoothness 
of $(•), F and F* are quite close. Computer packages may return F* as 
the normal scores statistic. A simple equation, accurate to four decimal 
places, for computing the normal scores (see Example 2.8.1) is given by 

£F„=4.9l[/“-(l-rt"'<], (3.3.4) 

For additional discussion see the papers by Terry (1952) and Klotz (1964). 

Example 3.3.2. Suppose /(x) is the logistic density given by f(x) = 
e-V(l + e-^)^ -oo<x<oo. The distribution function is Fix) = 
1/(1 + e-^), - 00 < X < oo, and it is easy to check that fix) = F(x)[l - 
Fix)]. Further, -/'(a-)//(x) = 2F(x) - 1. Applying (3.3.3) we have a(/) 
- £[2F(F(,.)) - 1]. Exercise 3.7.7 implies that FF(F(,.)) = i/im + n + 1), 
and hence 
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We thus see that the LMPRT is detemmed by the Mann-Whitney- 
Wilcoxon rank score test when the underlying distnbution is logistic 

Example 3 J J. Suppose f(x) is the double exponential density and f{x) 
-K><x<eo ^ccpt at x = 0 where the denvative is not 
defined, -/(x)//{x) = sgn(x} Hence the LMPRT is determined byufO 
>= £sgn ^, 0 . with !',,)<•••< the order statistics from f(x) As 

»n Example 33 I, we need tables of values of the expected signs of the 
double exponential order statistics and these are less numerous than normal 
Scores in the literature Some tables can be found in Govmdarajulu and 
Eisenstat (1963) We can also consider an approximation to a(0 by passing 
the expectation through the function Note that x > 0 if and only if 
2F(x) - 1 > 0, and hence 

Now define 

-S»»''[2-A- _ -|j 

Now note that sgnjR, — (m + n -f l)/2J= I if and only if R, > (m + n -f 
l)/2 if and only if Y, is greater than the median of the combined sample 
Let f'i (FI) be the number of Y observations greater Oess) than the 
median of the combined sample, then V* = VX ~ V Finally, when 
m -f n IS even, KJ. -t- FI * n. F* = 2 KJ. — n A bit more care is needed for 
tn + n odd but it does not matter much for large samples The statistic FJ 
is called Mood's median statistic and was discussed by Mood (1930) The 
statistic F* provides an approximation to F, the LMPRT In this case, 
however, the expectation was passed throu^ the nonsmooth sign function 
This can result in some loss m power for small sample sizes Conover et al 
(1978) show the LMPRT is a bit better than F* for small samples even at 
the double exponenusi disinbutum Asytnplolically they can be shown to 
be equivalent and in practice most people will use F* since tables for the 
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locally most powerful scores are not readily available. Later in this chapter 
we will see that Mood’s test is a two-sample analog of the sign test since it 
has the same two-sample test efficiency as the sign test in the one sample 
model. 


In general, if a(i) - - the score for the LMPRT, an 
approximate score can be defined by 


- f{F ’[//(m-t-n-H)]) 
^ W f(F-'[i/im + n + l)]) 


(3.3.5) 


where we first approximate F(,) by where f/(,) < • • • < 

are the order statistics from the uniform distribution on (0, 1) and then we 
pass the expectation through the function and note EU^,-^ = i/(m + n + 1) 
from Exercise 3.7.7. Hence, in addition to the Mann-Whitney-Wilcoxon 
statistic of Section 3.2, by selecting different density functions /, we can 
generate a multitude of rank tests. Generally the approximation V* 
- is more practical than the actual LMPRT, V = 

next section we study the distribution of general scores statistics under the 
null hypothesis. The limiting normality is established so that critical points 
for the tests can be approximated. 


3.4. GENERAL SCORES 

In this section we outline the major results for the asymptotic distribution 
theoiy under the null hypothesis. The situation is more complicated than 
that of the one-sample location model where independence implied the 
Central Limit Theorem was sufficient to establish asymptotic normality of 
the general scores statistic. 

In the two-sample model we no longer have the required independence, 
and it is necessary to resort to projection techniques. The basic results are 
proved by Hajek and Sidak (1967, Chapter V). They were able to prove a 
two-sample analog to Theorem 2.8.1 and it is described later. Before stating 
these results we develop the moments of the general scores statistics. 

As usual, we have m X observations and n Y observations with R,, 
• ■ ■ ,R„ the ranks of 7,, . . . , 1), in the combined data. Similar to Defini- 
tion 2.8.1, we have; 

Definition 3.4.1. Let 0 = a(0) < a(l) < • • • < a(N), N=m + n, be a 
nonconstant sequence, then V ■= is called a general scores statistic. 

We may also write a, = a(i). 
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Tlieorcm 3.4 1. Under the null hypothesis //q A = 0. m the location 
model, 


EV^ 


N 



N{N-\) 


Sl-, - 5)’ 


Froof First note EV = “ nFu(/?,) Now, from Theorem 3 21, 

= Further, and 

Varo(/?,) = Ea\R,) - t£o(/?,)l^ - “ 2f'(o, “ 5)V^ From 

Theorem 3 2 1 

Cov(o(fi,) »(«,)) - £[(«(«,) - 5)(a(S,) - 3)] 

- 22 


Since - 5) “ 2.(«*. - 3) - (o^ - 3)« -(o^ - 5) we have 

Cov(o(£,i,j(/f,))- - 2 («, - “)’ 


and 


VarF = Var2a(/?,) 

, - Zufii - 1) 

- »Var[„(£,)] + ^Cov(»(/;,) »(£,)) 

which combines to yield the desired result 


fn addition, in Exercise 3 7 4 it is pointed out that V has a symmetric 
distribution under the null hypothesis, provided a, + has the same 

value for all / = 1. , N The point of symmetry is na 
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We now introduce the score-generating function for the two-sample 
model. 

Definition 3.4.2. Suppose <J)(w), 0 < w < 1, i^ nondecreasing. Suppose fur- 
ther that 0 < /o(<#)(m) - < 00 where <}> = fo<l)(u)du. Define a(i) = 

4>{i/(N + 1)), N= m + n, then V = is the general scores statis- 

tic generated by the scores-generating function. 

Hajek and Sidak (1967, p. 163) prove thaUf V satisfies Definition 3.4.2 
and if min(m,«)^oo, then [K-f:F]/(Var has a standard normal 
limiting distribution. Now we suppose that N-^co and m/N^X, 0<X 
< 1, so that neither sample size dominates asymptotically. Then we have 

and 

A^VarF= ^ (a, - 5)^->\(l - (3.4.2) 

Hence the asymptotic normality can be expressed in terms of the asymp- 
totic parameters as: 


4N[V-i\-X)<p) 



- X)f(^(u) - 


du 


(3.4.3) 


has a standard normal limiting distribution. For additional discussion and 
development of this result see Chapter 8 of Randles and Wolfe (1979). 

If we let </)(»)= -/'(F"‘(w))//(^ ”'(«)). then, provided / has finite 
Fisher information, Definition 2.9.1, Hajek and Sidak (1967) show that. 
Definition 3.4.2 is satisfied. Hence, from (3.3.5), F* = the approx- 

imation to the LMPRT, is asymptotically normally distributed. The scores 
(3.3.3) which define the LMPRT F do not have a score-generating function 
in the sense of Definition 3.4.2. However, Hajek and Sidak (1967, p. 165) 
show that F and F*, when properly standardized, have the same limiting 
distribution. In fact, they have a stronger result that corresponds directly to 
(3.4.1), (3.4.2), and (3.4.3). Namely, if ^ satisfies Definition 3.4.2 and if 
we define a, = where f/j,, < • • • < are the order statistics 

from a uniform distribution on (0,1) then (3.4.1)-(3.4.3) hold for F = 
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'21a{R,)/S The point is that we can use either £<^(t/,„) or ^EU^^) to 
define the statistic V and it docs not matter asymptotically 

Example 341. Let where ) is the standard normal 

distribution function Then 



IS the approximation to the LMPRT for underlying normal distnbutions 
discussed in Example 3 3 1 We now check the square integrability condi- 
tion in Definition 3 4 2 Note 


hence 


-J" r*<f4>(r)- 1 

Thus, if we replace \ by m/N and I - A by n/N, we have {N^/mn)''^y is 
approximately n(0, 1) This approximation used in conjunction with (3 3 4) 
for computing y, makes the normal scores test practical 

Example 3.4 2 We now consider Mood’s statistic, first mentioned in 
Example 3 3 3 The statistic FJ , the number of Y observations that exceed 
the median of the combined sample, can be wntten Fi = + 1)) 

where 


«{«) = 


0 <«< 1/2 
1/2 < H < 1 


We consider the case N = m + ft = 2r, n < m in detail Since 


i,=^i/(JV + l))«0 
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if / = 1, 2, . . . , r and 1 if / = r + 1, . . . , it is easy to check 
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n 

2 


Var V* = — Y(a - a'\^= — 

With n < m, the exact distribution under the null hypothesis is hypergeo- 
metric with 


P{vx = k) = 



for k = 0, 1, . . . , n. Further, since 4>(-) is bounded, Definition 3.4.2 applies 
and —n/l\/[mn/A{m + n~ 1)]''^^ is limiting /2(0, 1). 

In Section 2.8 we introduced several examples of one-sample scores 
statistics; recall Examples 2.8.2 and 2.8.3. We now discuss the construction 
of two-sample scores statistics from one-sample scores statistics. 

Example 3.4.3. Suppose we have a one-sample score-generating function 
<?!■*■(«), 0 < u < 1, satisfying Definition 2.8.2. We will let denote the 
one-sample statistic defined in (2.8.4). Now extend <i>* (u) to (—1,1) by 
w) = and define ^(u) = (2u — 1), 0 < m < 1. Alternately, 

define <|)(i/) (2w — 1) if 1/2<m<1 and — if 0 < m 

<1/2. Then V denotes the corresponding two-sample statistic. Further, 

<f) = J' '</)(«) du 
= J* «/)■*■ (2m — l)dM 


= ij'_‘^<|.-(o)do 

= J |-J^^^'^(-u)du+J^’<f)+(t>)du| 
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and 


■ 5 /',[** Wl'* 

As an immediate application let ^ *(«)*»« 0 < u < 1 so that K^is the 
Wilcoxon signed rank statistic Then ^u) ■» 2b — 1 0 < u < 1 and V is the 
centered Mann Whitny-Wdconon statistic If we take ^ * (w) “ 1 0 < u 
< I corresponding to y* the sign statistic then ^«) “ 1 if 1/2 < u < 1 
and - I if 0 < u < 1/2 and hence V is the centered Mood s statistic If 

(m) “ 'l(u + l)/2] the generating function for the one sample noimal 
scores statistic then \u) generating the iwo-sarople normal 

scores statistic In Exercise 3 7 8 the two-sample statistics corresponding to 
the Winsorizcd signed rank statistics and the modified sign statistics of 
Examples 2 8 2 and 2 8 3 are discussed Later we will see that K* and F 
have the same efficiency properties In particular if F* is the AMPRT(see 
the discussion following Theorem 291) then K will be asymptotically most 
powerful m the two-sample model (sec Section 3 5) 

The asymptotic theory of Kajek and Sidak (1967) discussed previously 
makes it possible to approximate the cniical point for a broad range of 
two-sample tests Provided the expectation m (3 3 3) can be found or is 
tabled the critical point for the LMPRT can be approximated The 
approximation to the locally most powerful score (3 3 5) is one of several 
applications of the use of score generating functions The asymptotic nor 
mality is critical m the case of general scores since unlike the Mann 
Whitney Wilcoxon statistic the exact distnbution under the null hypothe 
SIS IS seldom tabled or available In most cases the null distnbution of the 
test statistic is symmetric and the normal approximation provides a good 
approximation even for small sample sizes 

We next discuss a counting form for the iwo-sample general scores 
statistic and use this to construct a Hodges Lehmann estimate of A Recall 
from Section 3 2 that “ !»'+«(«+ l)/2 where W'= #(T - Xj 
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> 0), i=\, . . . ,n, j =\, . . . ,m. For the general scores statistic, define 
K(A) = 2"a(^,(A)) where i?,(A) is the rank of F; — A among . . . , X^, 
y, — A, . . . , F„ — A. In Exercise 3.7.9 you are asked to show that F(A) 
decreases by as A crosses . Hence the mn pairwise 

differences play the same role in the two-sample problem as the Walsh 

averages in the one-sample case. Further, if F^CA) = 1 when - ^(0 > 0 

and 0 otherwise, then 

F(A) = e(A) + |:«/ (3-4.4) 

1 

where 

m n 

e(A) = S S (3-4.5) 

/=1 7=1 

Here Q = W of Section 3.2 when a, = i, i = , m + n. Compare these 

definitions and results to those of Section 2.8. In the same spirit as Section 
2.8, if we restrict attention to scores such thaj^ a,^j — = 1/(A^ + 1) or 

0, then it is simple to describe the estimate A. Let 

B = ((/,;•) :«-+7- +7- • = V(A^+ 1)}> (3-4.6) 

then, when the distribution of V is symmetric (see Exercise 3.7.5), 

(3-4.7) 

We complete this section with an application of these ideas to Mood’s 
median statistic; additional examples can be found in the exercises. 

Example 3.4,4. This example is a continuation of Example 3.4.2 on 
Mood’s median statistic. We consider the case N = m + n = 2r, n < m. Let 
a, = 0 for 1 = 1,...,/- and 1 for / = r -h 1, . . . , A^. Mood’s statistic is 
~ S”‘^(F,) and = 1 only for j + i = r + 1. Hence, since 

Si^, = 0, F* = — X^,y > 0) for J + i = r + \ and the estimate of A 

is 
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In this case, we can determine the ordenng of the relevant differences by 
only knowing the order statistics for the individual samples We simply 
note that 

^{ll “ 

For the case n 2 . - I and m ■» 2„. — i, we see at once that L ■« F(,., - 
X^, ,. + i, ” Vf,.*- “medy^-medX, Recall that 
y=Q. and the counting form does not differ from the rank form Hence 
if (F^ < fc) ■■ a/2, determined from the hypergeometne distnbution or 
the normal approximation, then the lower and upper ends of the (1 - a) 
100% confidence interval for A are 

(S'!*) 

and 

K- 

Recall the equivalence between hypothesis testing and confidence intervals 
To lest Ho ^mQ versus A v* 0, we reject Ho at level o if < * or 
y\> k where /•//,( Fi < *)- a^2 This is equivalent to rejecting Ho 
at level a if 0 is not contained in (A{,.£(,], the (I - o) 100% confidence 
interval 

This formuldtion shows that when the observations are taken sequenliaJ' 
ly — that IS we observe the order statistics — we may be able to terminate 
Mood’s test before observing all w + w observations This has applications 
in life testing m which time until death is observed From the last para* 
graph, if P,,^y\ <, k) — a/l, then Mood’s test rejects Hg A“0 if 

y(»*l)” ^(* + 1) 

or 

“ 1 ',. 

Hence, for example, if we observe before Fft + j) we can terminate 

the expenment and reject //q A»sO For more discussion see Gastwirth 
(1968) Comparison with another median test is given m Exercise 3 7 11 A 
discussion of early termination wifli the Mann-Whitney-Wilcoxon test is 
given by Alling(l963) 
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Figure 3.1. Sign-type confidence intervals for 9^ and Oy. 

We now relate this inference on A, based on Mood’s procedures, to the 
sign procedures discussed in Chapter 1. We consider the case m = n for 
simplicity; hence m + n — 2n = 2r and r=n. We then have = 5 ^(A:+i) — 
and Aj;= - -^(fc+i)- This interval can be broken apart as 

follows; consider the separate intervals [A'j;;.-!-!) [^(/t+i), 

T(„-a:)]- Then A^ is the difference in lower and upper ends of the Y and X 
intervals, respectively, likewise for A^,; see Fig. 3.1. A little thought and Fig. 
3.1 shows that 0 is not contained in [A^, A^;] if and only if the two intervals 
and [7(^+1), are disjoint. That is. Mood’s test 

rejects HQ:A. = Qd and only if the intervals derived from the sign statistic 
are disjoint. Finally, we need the confidence coefficients for the sign 
intervals. This is determined by ^(S < k) where k is originally deter- 
mined from Pf{{VX < k) = a/2. For example, using normal approxima- 
tions, if we take 1 — a = 0.95 so k = n /2 — 2[r//A{2n — 1)]'/^ from Exam- 
ple 3.4.2 with equal sample sizes, then < k) = i(— 2'/^) = 0.08. 

Hence, the sign intervals have approximately 84% confidence coefficients. 
In summary, for equal sample sizes, if we reject //q : A = 0 when the two 
84% confidence intervals are disjoint, then this is equivalent to a two-sided 
5% Mood’s median test. The Hodges-Lehmann estimate of A = 0^ — is 
the difference in medians, and a 95% confidence interval for A is found by 
taking the appropriate differences in the ends of the 84% sign confidence 
intervals. Exercise 3.7.14 outlines the case of unequal sample sizes. 

Example 3.4.5. In this example we construct the Kolmogorov-Smimov 
two-sample statistic. 

First, define a score 


aj{i) = 0 if 1 < / < y 

1 if y < / < N. 

This score can be generated by ^,(w) = 0 if 0 < w < t and 1 if r < h < 1, 
wherey = [/(Af -t- 1)] and [•] denotes the greatest integer function. Now let 
= Note that if N = 2r then ¥,= ¥%, Mood’s statistic. 

Hence, F, is the two-sample analog of the one-sample modified sign 
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statistic descnbed m Example 2 83 Exercise 3 7 8 asks for the centered 
version of Vj 

The statistic Vj counts the number of observations in the second (_>>) 
sample that have ranks greater than j Under the null hypothesis that the 
two samples come from the same distribution -j)n/N Let 

Z, „ < < denote the order statistics of the combined sample and 

let /; “ 1 if Z,,) IS from the second (y) sample and 0 otherwise Then we 
have 


- S I.' 


•”-SA 


Hence 


Vj-eVj-n- 2 / - n(/lt -,)/N - nj/N - g /, 

We now introduce the empirical edfs of the two samples and 

llgiy) If we evaluate their difference at we have 

i (I - / ) - i 2/ - i - (i 4)1/, 


The Kolmogorov-Smirnov statistic is 

D = maxiG„(r)- //,(i)| = ^max^|C„{Z„)- //„(Z,^)| 

= JL max \V-EV,\ 

The second equality follows from the fact that G„ and H„ are step 
functions with jumps at most at Z,„ 

This shows that the Kolmogorov-Smirnov statistic is not a linear rank 
statistic However it can be expressed as the maximum of a finite number 
of linear rank statistics the modified Mood s statistics For that reason Z> is 
distribution free under the null hypothesis the null distnbulion has been 
tabled The null distribution of i> is not symmetnc and the limiting 



3.5. ASYMPTOTIC DISTRIBUTION THEORY UNDER ALTERNATIVES 157 

distribution is not normal. Hajek and Sidak (1967, Chapter V) show that 

lim p{ D<t] = 1-2^ (_ + 

v->oo yy Jy j 

The statistic D measures the discrepancy between the empirical cdfs. 
Hence the test is designed to reject H^: G = H in favor of general alterna- 
tives H^:Gi=H. This is in contrast to the rank tests studied previously 
which were designed for location alternatives. The test based on D is useful 
for detecting nonlocation alternatives and is also useful in situations in 
which Mood’s test has high power. There are also one-sided versions of this 
test (see Exercise 3.7.19) and one-sample versions. 

In the one-sample versions, the empirical cdf is compared to a theoreti- 
cal cdf. If the theoretical cdf is parameterized, then the parameters can be 
estimated by minimizing the max,^|F„(A:) — F(x,0)|. Parr (1981) published 
a bibliography on these methods. 

In the two-sample case, Doksum (1977) reviewed several graphical 
methods for comparing two samples based on the Kolmogorov-Smirnov 
statistic. If G(0 = F{t) and H{f) = F{t — A(/)) then A(l) represents a 
general shift function and can be used to measure the difference between 
two populations in the absence of location and scale parameters. Doksum 
describes how to construct confidence bands for A(/‘), based on D. 

3.5. ASYMPTOTIC DISTRIBUTION THEORY UNDER 
ALTERNATIVES 

We first discuss the Mann-Whitney-Wilcoxon statistic W. The asymptotic 
distribution under the null hypothesis was derived in Theorem 3.2.4 using 
the Projection Theorem. The derivation of the asymptotic normality for 
general alternatives using the Projection Theorem is outlined in Exercise 
3.7.15. The moments of W are derived later. Using the asymptotic normal- 
ity under the null hypothesis and moments in general, we establish the- 
consistency of the Mann-Whitney-Wilcoxon test and describe the consis- 
tency class as discussed in Section 1.3. For the case of unequal distribution 
shapes, a Behrens-Fisher type test is developed using W. Finally, we derive 
the Pitman efficacy of W and complete the section with a discussion of 
efficiency for the general scores statistics. 

Let Y], . . . , and 7,, . . . , y„ be random samples from arbitrary, 
continuous distributions G(x) and H{y), respectively. Recall that W 
~ 22 where Ty = 1 if Yj — X,>0 and 0 otherwise, is the counting 
form of the Mann-Whitney-Wilcoxon statistic. Next, define the following 
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paramclers 

p,-p(y> [I - 

p, -P(Y,> X,. r, > Jr.) [ 1 - "(*)]’«(*)* (3 5 1) 

p,-Pir,>X, y,>X,)-f"j’Mh(y)dy 

Theorem 3^ 1. The mean and variance of W arc given by 
£H'“ mnpi 

Vai W - mn{pt - pj) + mn(n - \){p^ - p^) + mn(m - l)(pj - p]) 

Proof First note that £1^ - rrnET,, - mnP{ Y > X) and PiY > X)*‘ 
!*»!Tgix)h(y)dpdx - /?,(! - //(A)l^(x)<it Now, 

£ii'’-£:[2Sr,]’-£[22223',3-..j 

-£{223'!; + 222r,7; 

+2227vn,+2:s2Sr;n,) 

> } 

« mnETff -P mn(tt — !)££,, Fu + mn{m — l)£r|jr22 

+ mn{m - !)(/? - OETi.rjj 

For example, 

£7'„r,j=P(y,>X, y2>^i) 

= |_"jl-H(x)fg(A)dx 
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The others are computed in a like manner. Then Var JV = EW^ — [EWf 
which results in the given equation, when simplfied. 

Now, when m,n-^cc in such a way that m/N-^X, 0 <X< 1, Exercise 
3.7.15 shows that, provided 0< pi <^, [W— EW]/(y^r is asymp- 
totically n(0, 1). More precisely, if W = W/mn, then we have from the 
exercise that N^/\W-pj) is asymptotically n( 0 ,ip 2 - p^)/X + (p^- p^) 

/(1-A)). 

Definition 3.5.1. We will say that H(x) is stochastically larger than G(x) if 
G(x) > H(x) for all x, with strict inequality for at least one x. See Exercise 
3.7.19 for a test for stochastic ordering. 

Example 3.5.1. We now consider the consistency of the Mann-Whitney- 
Wilcoxon test. Define ii(G,H) = EW = j[l — H{x)]g(x)dx. Further, let 
denote the subclass of stochastically ordered distributions. Then 

p{G,H)^\ if G{x) = H{x) 

>i if G,H E^so (3.5.2) 

where the inequality occurs because G,H are continuous distribution 
functions with Gix) > H(x) for some x and hence for an interval of x 
values. Hence W separates the null hypothesis of G(x) = H(x) for all x 
from the subclass of stochastically ordered distributions. It is easy to check 
that the Var IF tends to 0 so that W converges in probability to p(G,H). 
Further, we have the required asymptotic normality under the null hypothe- 
sis; so, Theorem 1.3.1 implies that W provides a consistent test for stochas- 
tically ordered alternatives. Note that if G{x) = F{x) and H(y) = F{y — 
A), A > 0, then G,H G flso- Hence, the test is consistent for a change in 
location. It should be noted that stochastic ordering is more general than a 
location change. Compare this to stochastically positive distributions in the 
one-sample model discussed in Example 2.4.1. 

The sampling model that we have considered thus far supposes that the 
two populations have the same shape. We now consider the model 
. . . , i.i.d. G{x), G e flo and 7,, . . . , i.i.d. H(y - A), H E Sq. 
Testing //q-. A = 0 provides a nonparametric analog of the Behrens-Fisher 
problem. When G and H are two normal distributions with different 
variances, Welch’s (1937) modfication of the usual pooled two-sample i test 
performs quite well. We are now in a position to consider the problem for 
general G,H E%. The idea is to use the limiting normality of W for 

arbitraiy G and H and introduce a consistent estimate of the variance 
of W. 
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From Exercise 3 7 15 we know that (IF - £IF)/(Var is limiting 
/i(0, 1) When C « and A = 0, £IK and VarlF are given by (3 23) and 
are independent of the common cdf C •» // We seek a consistent estimate 
of Var IF From (3 5 1), with a little algebra, we have 

- If »(')•“:(')}" P 5 3 ) 

P,-p}-fa\T)dH(,)-[fc(,)dH(l)y 

These equations suggest forming the natural estimates by replacing C and 
II by G„ and //,, the empirical cdfs These estimates are most simply 
expressed in terms of the placements, defined next See Orban and Wolfe 
(1982) for a discussion of the use of placements m the construction of lest 
statistics 

Definition 3 J 2. Given .Af„andF,, , the placement of Af, 

among y,, , F, is the count p,(x) < Af,./ « 1. ,n Likewise, 

the placement of F* is p*(y) ■ < F*. i ■ 1 ,m 

Now, 

mn ,,| 

-ipw 

and 


(3 5 4) 

(3 5 5) 
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- S (3-5-6) 

mn ,=i 

1 r 1 

Est{/72 ~pW 

= -^^ pH^) - 

mn^ t ,= i J 


m 

1 S (p,(^) - pW)^ 

mn ,= I 

( 3 . 5 . 7 ) 

mn 

where .S^(x) = 2T'=i(p/(^) “ PW)^ centered sum of squares of the 
placements. 

In Theorem 3.5.1, if we replace m — 1 and n — 1 by m and n, then a 
computationally simple, consistent estimate of Var W is given by 

\^W = p(x)p{y) + S\x) + 5^(7). (3.5.8) 

Note that since the placements are functions of the ranks, that the statistic 



is a rank statistic. When A = 0 and G = i/, IF is distribution free and its 
permutation distribution has been tabled, for selected sample sizes, by 
Fligner and Policello (1981). However, to insure that W is even asymptoti- 
cally distribution free when A = 0 and G H v/e must assume G,H E . 
The following theorem due to Fligner and Policello (1981) is the basis for 
the test. 
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Theorem 3^ J Suppose A - 0 and G // £ ft. Then W is asymptotically 
n(0 1) 

Proof When G // £ ft, from Tlieorem 3 51 
£(r- mnjG{y)h(y)dy 

- - G{-y)]h{y)dy 

= mnj{l~Gi-y}]hi-y)cfy 

Companng the first to last line shows fG{y)h{y)dy t/2 and hence 
£lf'" mn/l when C // eft. The theorem now follows from Slutsky s 
Theorem A3 since we have (tt'- SU f/fVar ff )' * is J mitrng n(0 I) by 
Exercise 3 7 15 and ^ar W is consistent 

The test rejects Hg £^•‘0 for //, A > 0 when IK > Z, where 1 - 
4>(Z.) - a 

In Table 3 2 from R gner and Policello (1981) the empincal levels 
based on 10 000 simulations are given for IK |K r and (Welch s test) 


Tabic Empirical Levels Times 1000 lor Nominal a « 05 
in •• It R « 10 O' is the scale of Y 

D stnbut on o IK jK | 

Normal 0 1 81 48 48 48 

0J5 69 54 50 52 

I 50 43 48 47 

4 71 54 60 47 

10 82 62 69 52 

Contain naled 0 1 76 SI 33 34 

Normal 0 25 65 52 33 33 

1 48 46 35 33 

4 68 52 43 32 

10 83 63 50 35 

No e (»■ s Welch s t The contain na ed normal s I om Example 2 6 4 w h 
e*>01 Each level s based on 10 OOOt mala tms Repnnledw hpermssonof 
the Amencan Sta( stical Auociat on 
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Both the normal and contaminated normal (Example 2.6.4) distributions 
were simulated for H{t)= a varying from 0.1 to 10. The test 

based on W is the most stable, outperforming for the contaminated 
normal. Fligner and Policello (1981) simulated other underlying distribu- 
tions and included a study of the power of the tests. The test based on W 
emerged as superior in maintaining its level, and achieved high power at 
the same time. Exercise 3.7.16 investigates the behavior of the level of W, 
the regular Mann-Whitney-Wilcoxon test, when H. See Hettmans- 
perger and Malin (1975) and Fligner and Rust (1982) for modifications of 
Mood’s test in case G and H are not assumed to be in . 

We will not discuss the regularity conditions needed to rigorously 
develop the efficacy of W. These conditions, such as the uniform conver- 
gence to normality, are achieved in a way similar to that in the one-sample 
model; see the discussion following the Pitman regularity conditions in 
Section 2.6. The formal calculation of the efficacy will now be given. Recall 
that we need the asymptotic variance under the nun_hypothesis and the 
asymptotic mean under the alternative. Now N^^\W —p^ is asymptoti- 
cally n(0,o^) where 

= {Pi - Pi) A + (P3 - P?)/(^ ~ ^)- 
Under the null hypothesis a\0) = 1/12X(1 — \). Furthermore, 

P(^)=Pi = [I - F(x-A)]f(x)c/x 


and 


J — oo 

Hence the efficacy of IF is 


c = ia'(0)/a(0) = Vl2X(l - \) J f(x) dx. (3.5.10) 

Note that c is simply [X(l — X)]''^^ times the efficacy of the corresponding 
one-sample test. In Table 2.4 we illustrated the finite sample efficiency of 
the Wilcoxon signed rank test relative to the t test. A variation on this 
is discussed by Witting (1960) for the two-sample Mann- 
hitney-Wilcoxon relative to the two-sample t test. Using numerical 
approximations to the small-sample power functions, Witting approximates 
0 efficiency for small sample sizes. In the cases he studied, the efficien- 
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cies for samples as small as m ■» n ■* 10 never fall below 0 94 for an 
underlying normal model Hence in agreement with Klou (1963) u would 
appear that the excellent asymptotic efficiency properties of the rank test 
relative to the / lest at a normal model hold for small samples as well 
Further note that c is maximiicd by A “ 1/2 hence it is best to take equal 
sample sizes We may also conclude from Theorem 2 5 5 that if A 
= med( Y, - XJ) the Hodges-Lehmann estimate of A then N '^^(A - A) is 
asymptotically n(0 I /c^) with e given by (3 5 10) 

When deriving properties of the Mann-Whitney-Wilcoxon statistic it is 
usually easier to use the counting form IV For example we apply the 
Projection Theorem 2 5 2 to W rather than U the rank sum form How 
ever U has exactly the same properties as W because they are linearly 
related When discussing the general scores statistics we will consider the 
rank form given in Definition 3 4 2 namely 



From the discussion preceding (34 1) we have A'VarK-^Xfl -X) 
~ The asymptotic mean of V under the alternative is 

developed like (2 9 2) in the one sample case The rank of F is ^ the 
number of observations in the combined sample less than or equal to Y 
and hence /{ ■ /nG„( F) +• n//,( F) where 0*( )and//,( ) are the empin 
cat distribution functions for the X and F samples respectively The 
statistic can now be wntten as 

-slHjvfr Am "-O')) 

“ s J" <■{ 7m + m x-w] ■"'.(/) 

Since C„(x) and H„(y) converge in probability to C(x) and H(y) respec 
lively we expect under some rcgulanty conditions that 

P-i(l - X)J" «(AC(,) + (I - >,)H{y))dlHy) (3 5 12) 

For the location model with G(x) = F{x) and lHy) = F{y - A) we have 

1.(4) - (I - A) J" + (I - l)F(> - A))/(^ - 4)4- 
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Now make the change of variable / = j — A and then differentiate with 
respect to A to get 




J— oo 


Hence the Pitman efficacy is 


c = 





(3.5.13) 


The regularity conditions for establishing this result rigorously are similar 
to the one-sample case discussed after (2.9.3). 

The same type of change of vanable and integration by parts used to 
derive (2.9.4) from (2.9.3) can be used on (3.5.13). The result is 


c = 


/o4>( “)'!>/(«) du 



(3.5.14) 


with 


<Pj{u) = - 




Since = 0 and — /(/), Fisher’s information given in 

Definition 2.9.1, Theorem 2.9.1 can be extended immediately to the two- 
sample case with [X(l - X)I{f)]'^^ in place of Hence if Vj is the 

two-sample statistic generated by then it is the asymptotically most 
powerful rank t^t statistic when F £ Og is the underlying distribution. 
Note, also, that Vj is the approximation to the locally most powerful rank 
test statistic. See (3.3.3) and (3.3.5). 

In the next example we show that (3.5.10), which relates the one- and 
two-sample Wilcoxon efficacies, holds for general scores statistics. In Ex- 
ample 3.4.3 we constructed two-sample score-generating functions from 
one-sample score-generating functions. We now reverse the process and 
iscuss the construction of a one-sample score-generating function from a 
given two-sample score-generating function. It then follows that the effi- 
ciency properties are common to the one- and two-sample tests generated 
oy these score functions. 
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Example 35.2. In this example suppose lhat F eQ, and define 

0<u<l. 

for some given two-sample score-generating function ^«) which satisfies 
«u)» -<Xt - u) Hence ^ is an odd function centered at This 

implies that 0 and 

(“)]'<*'- 2j|' ♦>)<'»- jTV’i")* 

Furthermore, using 


we have 


- j/."o(fW)/’(.<)* 

Now the efficacy of the one-sample test generated from ^* ( ) is given by 
(2 9 2) as 

pf(*)- >)/’(')* 

c ' 

ViiC^ 

Cancelling the 1 /2, and since ^ = 0, we have the two-sample efficacy is 
[X(l - times the one-sample efficacy Since the efficiency is the ratio 
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of the squares of the efficacies and since the factor X(1 — X) cancels, we see 
that the efficiency properties of the one-sample and two-sample tests are 
identical. 

The Mann-Whitney-Wilcoxon test corresponds to the Wilcoxon signed 
rank test, and Mood’s median test corresponds to the sign test; other 
examples are given in Example 3.4.3. The efficiency calculations for the 
one-sample tests in Section 2.5 can be interpreted now in the two-sample 
case. 

We complete this section by using Example 3.5.2 to anticipate the form 
of the locally most powerful rank test in the one-sample location model. 
We begin with the LMPRT score, (3.3.3), 


a(/) 


( m,) 

I /(^(o) 


{/[''■'(t'w)] . 


where < • • • < are the order statistics from a uniform distribu- 
tion on (0, 1). Example 3.5.2 suggests that the one-sample score should be 







(3.5.15) 


Suppose F eQ,^ and define 


Then (:v) = 2f(x) if a > 0 and 0 otherwise. Let y = (x), then x 

= F-'(y) andx = F-'[(y + l)/2]. Hence F-'(y) = F-'[(y + l)/2], and 
(3.5.15) becomes 





= -E 


f'AK) ] 
fAK)) I 


(3.5.16) 


where < ■ • • < are the order statistics from the distribution 

K(x). 

The one-sample test based on F+ = (F,-^)s(X,) where R* is the 

rank of \Xj\ among |X’,|, . . . , is, in fact, the locally most powerful 
rank test; see Hajek and Sidak (1967, Chapter II) or Randles and Wolfe 
( 79, Chapter 10). Either (3.5.15) or (3.5.16) can be used to construct V'*'. 
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The AMPRT, discussed after Theorem 29 !, is the approximation to the 
LMPRT 

In both the one- and two-sample location models the AMPRT and 
LMPRT have the same efficiency properties From the point of view of 
Pitman efficiency the tests are equivalent, however, as mentioned after 
(3J 5), the AMPRT is more practical A direct development of locally most 
powerful rank tests in the one-sample model, similar to the development in 
Section 3 3, can be based on Exercise 3 7 18 


3.6. COMPARISON OF DESIGNS 

We complete this chapter with a comparison, proposed by Hodges and 
Lehmann (1973), of two types of randomization in testing a treatment (T) 
against a control (C) The presence of a known type effect such as right 
and left handedness creates the sctlmg for the construction of both one- 
and two sample tests The proposed designs are compared by comparing 
the two tests, hence ideas from Chapters I and 2 are brought together m 
this example 

In the Completely Randomized Design (CRD) we have N pairs of 
subjects Treatment and control are randomly assigned within each pair, 
perhaps by the toss of a com The usual analysis proceeds by applying 
a one-sample lest to the 7 - C differences, that is to the T excesses in 
Fig 3 2 

Now suppose that each pair consists of a type A and type B member, for 
example right and left hand, and suppose there is a l>pe effect in the data 
In a Restnctedly Randomized Design (RRD) we randomly choose a pairs 
from the N and apply the treatment to the type A member In the 
remaining N - a pairs, type B gets the treatment A two-sample test is 
then applied to the two sets of A excesses in the figure The two designs are 
illustrated in Fig 3 2 

We next develop a model for the RRD Let t represent the type effect 
and let A,, ,X^ denote the A excesses Then before the treatment is 

applied we have a random sample of size N from F[x — t). F E f). Let 5 
denote the treatment effect Then , A], is a sample from F{x - t - 

6) and , A*, denoted A|, A*, is a sample from F{x — t + 

S) Hence 28 represents the difference in locations due to a treatment 
effect, and we wish to test H,, 8 = 0 versus 6 > 0 We suppose that as 
00 , a/N-*a, 0 < a < 1 and b/N-*P 0 < < 1 The efficacy of the 

Mann-Whitney-Wilcoxon test is CMii' = (®/S)''^^2(12)'''^//\x)(ix The ex- 
tra factor of 2 appears because the difference in locations is 26 The 
efficacy of the two-sample < is C 2 , = (aj8)'^2/o^ where oj is the vanance of 
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Member 

Types 

CRD Pairs A B Treatment Excesses 

1 T C T- C 

2 C T ■; 

N C T T-C 


Member 

Types 

RRD Pairs ABA Excesses 

1 T C T- C 

2 T C : 

a f C T-C 

1 C T C-T 

b C T C-T 

Figure 3.2. Two kinds of randomized designs. 


F E The efficacy is maximized by taking a = j8 = 1/2; hence from now 
on we will only consider the case in which N is split equally and a = b. 

In the CRD, with probability 1 /2, a type A subject is treated. Let 
Z), . . . , denote the treatment excesses. Then Z, is an A excess with 
probability 1/2 and a B excess with probability 1/2. Given that Z, is an A 
excess, P(Z, < z) = F(z — t — 8) and given that Z, is a B excess, F(Z, < z) 
= F(z + T — S) when F £ S2^, This follows from the preceding discussion 
on the model for the RRD. Hence if G(-) is the distribution function for 
the T excesses, 

G(z) = F(Z, < z) =iF(z - T - S) + iF(z + r - 5) 
with density 


g(-) = i/(z - T - 6) + i/(z + T - 6). 
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The mean and vanance of C( ) are 6 and Op + respectively Hence the 
efficacy of the one sample t test is C|, = l/(«f + In order to find the 
efficacy of the Wilcoxon signed rank test, we compute 

j_” i /(«V(= + J')*. 


and Cuf= \ 2 '^^jg\ 2 )dz 

We now compare the CRD to the RRD through the Pitman efficiency 
In the case of the l tests, the efficiency of the one-sample t (CRD) relative 
to the two-sample t (RRD) is 

c* o* 

e(li.2t)= < 1 forallT (361) 

Cj, + T 


In the rank case we compute the efficiency of the Wilcoxon signed rank test 
(CRD) relative to the Mann-Whilney-Wilcoxon lest (RRD) and find 


e(iy.MW) 


1 1 . , 1 

“ ;/’(')* I 


(3 62) 


From the Cauchy-Schwarz inequality, +2r)dz)^ < 

Jf \2 + 2T)d!? ■ Hence we have 

foralW 


In either case the RRD is superior to the CRD If there is no type effect 
present t *■ 0 the efficiency is I so nothing is lost by using the RRD The 
actual savings are computed, m the case of an underlying normal popula 
lion, m Exercise 3 7 17 


3 7 EXERCISES 

3.7 1 Let R,. , R„ denote the ranks of Vj , in the combined 

sample of size N ^ m + n Under Hq ^ = 0 show that ER, = 
(N+i)/2,\aTR, = iN^- I)/12and Cov(/?,.R^)*= -(Ar+ 1)/12 
for « Then show that if f/«= ~ + l)/2 and 

yatU‘‘mn{N+ 1)/12 

3.7 2 Suppose X,, , X„ are 1 1 d F, an arbitrary continuous distribu- 
tion with mean p and variance o* Let R,, , R, be the ranks of 
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X], ... ,X„. Using the fact that the conditional distribution of A',, 
given i?, =7 is the distribution of the 7 th order statistic, show that 

n ^=1 

= r yf{y)dy+ (« - J) f” yf(y)^(y)^y- 

Hence show that the correlation between X, and i?, is 

Ii + (n- i)f-^yfiy)F(y)dy ~ iijn + i)/2 
a(« 2 -l )/12 

Find the correlation p„ and lim„_^«,p„ when / is n(0, 1). 

3.73. Recall T is the Wilcoxon signed rank statistic discussed in Section 
2.2. Derive the following recurrence equation for the distribution 
of T under Hq. 


P{T=k) = 


Pn{k) 

2 " 


for A: = 0 , 1 , ... , n(ri + l)/ 2 , where 


with Fo(Q) — ^T^d P„(fc) = 0 for fc < 0. Hint; Let 

P„{k) be the number of subsets {r,, in 

for which ^-{r, = k. 

3.7.4. Suppose < • • • < X^„y and L(i) < ■ • • < are the order 
statistics for two samples each of size n. Gabon’s statistic, which 
can be used to compare two teams with ranked members, is 
defined as V= 4^ > X^,y), Let P„,„(k) be the 

number of sequences of m x’s and n j’s such that V = k, and 
derive a recurrence formula for P„S^). By computing the distribu- 
tion of V for various values of n, see if you can guess the form of 
the distribution of V. 

3.73. Let V- given in Definition 3.4.1. Prove that the distribu- 

tion of V, under Hq, is symmetric about na provided that a(i) + 
a{N ~ i + \)= K for all i = 1,2, . . . , 77, where 7^ is a constant. 
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376 In Example 1 ! 1 il was hypoth«ized that homing pigeons used 
the sun to navigate In Example 1 5 I and Exercise 18 11 the data 
on birds released on sunny and cloudy days suggested that homing 
pigeons home better on sunny days The hypothesis can be for 
mally tested using the Mann-Whitney-Wilcoxon test Following is 
a portion of the data randomly selected 


Sunny 


17 32 42 42 55 72 97 

10 38 lOS 126 141 


The measure is the error angle made with the homing line when 
the bird disappears over the horizon Let A be the difference of the 
sunny and cloudy population medians Test //q A = 0 versus 
//^ A < 0 at approximate level 05 construct a point estimate of 
A and construct an approximate 95% confidence interval for A 
Carry out a simitar analysis based on Mood s procedure descnbed 
in Examples 3 4 2 and 3 44 

37 7 Suppose X, arc ud F(x) f S Bp Show that F, 

y',-f(A:.)areiid t/(0 1) Show that the pdf of 
Yfj the ilh order statistic IS 

r(“+') 


for 0 <y < I Further show that - i/(n + 1) 

378 Construct the two-sample score functions that correspond to the 
Winsonzed Wilcoxon signed rank statistic and the modified sign 
statistic in Examples 2 8 2 and 2 8 3 respectively Find the asymp 
totic moments (3 4 I) and (3 4 2) for use in the limiting distnbu 
tion 

379 Suppose = ** ® general score statistic given in Defini 

tion 3 4 1 Let f'fA) = (^)) 0 ^ **) Prove that F(A) de- 

creases by a - u I as A crosses - X ^ , Hence the prop 
erlies of the general score stalisbc are determined by its behavior 
at the tnn pairwise differences Verify (3 4 4) and (3 4 5) 

3 710 Using (3 4 6) discuss how to construct estimates of A that corre 
spond to the statistics in Exercise 3 7 8 

3 711 Let X) X_„ and Y, Y^ be random samples from Fix) 
and Fiy - A) F e Pj We will consider an alternative to Mood s 
test due to Mathisen (1943) for testing //q A = 0 versus 
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A > 0. Suppose for simplicity that m = 2q — \ and define M — 
#(Y,> / = 1, . . . , Hence M counts the number of Y 

values that exceed the median of the X’s. Under Hq ; A = 0, show 
that 


m + n 
m 

for I == 0, 1, . . . , n. Further, EM — nfl and Y&rM = n(m + « + 
l)/[4(m + 2)]. 

Using the notation of the sign statistic, note that M = 

= "^siY, — A'jgj). Under use Exercise 2.10.18 to write 

-^[s(0)-f ]+VJ/(0)X,„ + »,(i). 

Now show that, if m,n-><x) so that mf{m + n)-^X, 0 < X < 1, 
then 


X[M-|]4z~n(0,l/[4X]). 

3.7.12. This exercise continues Exercise 3.7.11. We reject H(,:A = 0 for 
/f^:A>0 at approximate size a if M>c where c = n/2 + 
Za(«/[4X])'/^. Show that M > c if and only if 

Hence we can terminate Mathisen’s test as soon as we observe 
or Show that Mood’s test, discussed in Example. 
3.4.4, terminates the one-sided test as soon as we observe 
O'" ^(n-d+i)> where P{V%. > d) = a and d=nl2+ Z„(nX/4)'''^ 
Show that for large n, Mathisen’s test will always terminate before 
Mood’s test. Hence Mathisen’s test achieves a greater savings. 

3.7.13. Carry out a 5% Mood’s test and find a 95% confidence interval 
and point estimate for A, for the data in Example 3.2.2. 

3.7.14. Recall = #(7, > median of combined sample) is Mood’s sta- 
tistic in Example 3.4.4. Suppose n < m and P(V* < /t) = a /2, 
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where k = n/2- and N = m + n. from 

Example 3 42 From (3 4 8) the two sign-confidence intervals that 
determine Mood's test are [AT^^ +i).A’(«,-rf.)l and ( I 

where d^ = r — n+k + l and ^ ^ Note that + (m — n) 

/2 Show that the confidence coefficients for the two individual 
intervals can be approximated as y, •=4'(-2^/2l/»/(A' - 
and = Of - - »)]■'") 

3.7.15. Suppose A",, ,X„ arc 1 1 d G(x) and F,. , are 1 1 d 

H(y), where G and // are arbitrary continuous cdfs Let 

where 7"^ = I if yj>X, and 0 otherwise Suppose m,n-*oo in 
such a way that m/N -♦X.O < \ <l,N=m + n Prove that W‘ t$ 
asymptotically n(0.<j*) where 

Show, forther. that VarlF'-»<j* and when 0<^i<l, (IF - 
£lF')/(Vaf IF IS asymptotically n(0, 1) (This exercise general* 
lies TTieorcm 3 2 4 ) 

37.16 Suppose X,, ,X„ are iid C(x) and F,, , F, are iid 

W(y-d) where G.H SQ„ Ut IF* « (IF- mn/2)/[mn(m + 
n+ l)/l2]'^^ the standardized Mann Whuney-Wilcoxon stalls 
tic When A = 0 and G = H, IF* is limiting n(0, 1) We wish to 
investigate the effect on the significance level of IF* when G¥= N 
Let Of “ F(IF* > 2,), then when A = 0 and G= H, 

Here a is called the nominal level and a^, the true level Suppose 
m,n-*(a in such a way that m/N~-*X, 0 < X < 1 For arbitrary G, 
II e Qq, show that 

if p^<{ 
if p, 

if Pi = j 
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where p^, p 2 , and p-^ are given in (3.5.1). To have any control over 
aj, we must have Pi = ^- A sufficient condition for /?i = j is 

Consider the case where H(^) = G(y/a), G G so the distri- 
butions differ by the scale factor a. Show that 

P2-pf = f [1 - G{x/a)YdG{x)-^^[ ^ 

■> I 0 if 00 

Likewise /?3 — or ^ as or oo. Now show that 


1-$(Z„/V3(1-X)) 

1-$(Z„/V3A) 


if CT->0 
if (7— >00. 


Finally, show for the case of equal sample sizes and a = .05, that 
aj- ranges from .05 to .087 as o-^O or oo. This indicates that when 
the distributions are symmetric and differ perhaps by a scale 
factor, and when the sample sizes are the same, the true level of 
W* is not that much different from the nominal level. 

3.7.17. Suppose that F is n(0,a^). Find e(W,MW), (3.6.2), as a function 
of and r. Table and graph e{W,MW) as a function of r/cr. 

3.7.18. Suppose Z,, . . . , are i.i.d. F G Let F'^ (x) = F(|Z| < x), 
the cdf of |Z|. From Section 1.4, the distribution F'^ can be 
specified by the triple {p,G,H) where p = P{X >0), G(x) = 
F(1Z| < X I Z < 0) and //(x) = F(|Z| < x 1 Z > 0). Let A + be the 
number of positive Z’s. Given N = n, let , . . . , be the 
ranks of the positive observations among the absolute values. 
Show that 


P(F(t) = r„...,il(-;;) = r„,iV+ = u) 


where = r^, , R^*y = r„ | + = n) is given by the result 

in Theorem 3.3.1. 

3.17.19. The one-sided Kolmogorov-Smimov statistic is D'^ = 
roaXr{G„,(z)- //„(z)), see Example 3.4.5. This statistic could be 
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used to test for slochasuc oidenng. Definition 3 5 I The null 
hypothesis Hq G = H would be rejected for G > II, with 
stnct inequality somewhere when D* > e 

a Suppose m “ equal sample sizes Use the reflection pnnci- 
ple m Feller (1968. p 68} to show that, under the null 
hypothesis, 

( i" ) 

A -1.2, ,n 

b In general, it can be shown that 



(See Hajek and Sidak. 196T, Chapter V) If m ■ «, rt-»ee, 
venfy the limit using the result in (a) Hint Use Sterling's 
formula k'— U®y^*k**'^*e"* oi' 'he lae\ona\s 
3 7J0 The result in Theorem 3 3 I can be used to construct LMPRTs in 
the scale model Suppose that X, random sample 

from F(x) and F,, . F, is a random sample from C(^)* 

Fiy/r) Fed,) Suppose the model is sufficiently regular to 
interchange differentiation and expectation 
a Show that the LMPRT for testing II^ t = 1 versus t > 1 
can be based on the statistic 


>-» I 




where y<.,< < are the order statistics from F 

b An important scale model, used m life testing and reliability, is 
the exponential model The pdf is f{x) = e~* if x > 0 and 0 
otherwise Suppose Jf,. X„ is a sample from this expo- 
nential distribution Tlie joint pdf of A’,|, < < Xf„y is 

0 < X] < Xj < < x„ < CO, and 0 otherwise Let 

^2 “ ^(I| ~ ^<11* • ” '^<"-11 

Find the joint pdf of , W„ Argue that IF, , }V„ 

are independent and for i ^ ], ,n, Wi has an exponential 
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distribution with pdf (n — / + l)e mean EWj 

= (« — /+ 1)“*. Now note that = Wi + ■ ■ • + W/ and 

£Z(,)= 2(«-y + i)“'- 

7=1 

c. Using (b), show that the LMPRT for Hq : r = 1 versus : t 
> 1 when F{x) = 1 — x > 0, and 0 otherwise defined by 


v=t<Rj) 

7=1 

where a{i) = — k + 1)“'. Show that EV = n and 


Var V = 


mn 
N- 1 





The normal approximation can be used to approximate critical 
values. The test is due to Savage (1956). 



CHAPTER 4 


The One- and Two-Way Layouts 
and Rank Correlation 


4 1 INTRODUCTION 

In the last chapter methods were developed to investigate the difference in 
locations of two populations In this chapter we consider the extension to 
more than two populations We develop the methods only for ranks (not 
for general rank scores) Asymptotic distnbution theory is developed for 
approximations under the null hypothesis and for the computation of 
asymptotic efficiency 

The one and two-way layout designs are treated in detail In the former 
we have k samples and wish to test the null hypothesis that the samples all 
came from the same population In the latter case we wish to compare k 
populations but the data is labeled by an additional vanable In this case 
we have data classified in two ways the population and the block The 
one and two way layouts are examples of the more general linear model 
which is the subject of Chapter 5 Wc have chosen to treat these special 
cases in the present chapter because the tests due to Kruskal and Wallis 
(1952) for the one way layout and Fnedman (1937) for the two-way 
layout are simple to use do not require a computer and are widely 
available in applied texts The present chapter serves as an introduction for 
the following chapter which deals with the general linear model 

Along with the tests that are introduced we discuss multiple compan 
sons This IS an important follow up to any significant comparison of k 
populations since the compansons help identify the sources of significance 
Further we develop tests for the null hypothesis against an alternative 
consisting of a prespecified ordenng of the k population locations These 
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tests for ordered alternatives are generalizations of one-sided tests in the 
simple location models. 

The final section of this chapter contains a discussion of rank correla- 
tion. We then interpret the Kruskal-Wallis and Friedman tests in terms of 
rank correlation. This provides further motivation and rationale for the two 
tests. In addition, we discuss measures of concordance or agreement among 
and between groups of judges. These measures are constructed from rank 
correlations and are also connected to the Friedman test statistic. 


4.2. THE ONE-WAY LAYOUT: THE KRUSKAL-WALLIS TEST 

The sampling model consists of k samples , X„^^, . . . , X^i^, . . . , 

X„^i^ from F{x - 0,), . . . , F(x - 6f), repectively, where F G Bq. This can 
also be written X,j, i = \, . . . , ny, j = \, . . . , k with cdf F{x — 6j), F G flg- 
We wish to construct a test of Hq \ 6^ = ■ ■ ■ = versus : 9^, . . . , 0^. 
not all equal. This null hypothesis simply specifies that the locations are all 
equal without specifiying the common location. Without loss of generality, 
we could let 0, = 0 and define — Bj, J = I, . . . , k — 1. Then the 

null hypothesis becomes : A, = • • • = A;^_ , = 0. 

Example 4.2.1. In a study of the humoral (blood) basis of behavior, Terkel 
and Rosenblatt (1968) induced maternal behavior in virgin female rats by 
injecting them with blood plasma drawn from females that had just given 
birth. Virgin rats were exposed to young pups, and the time it took them to 
begin retrieving the pups was recorded. Retrieving is a recognized maternal 
behavior that generally appears within 48 hours after giving birth. It is 
known that virgin females will begin to show maternal behavior when they 
are continuously exposed to pups for about 5 days. Hence, the issue is 
whether maternal blood plasma will reduce the time. 

The experiment used 32 virgin female rats, each 60 days old. They were 
randomly assigned to four groups of size 8. The groups consisted in: (1) rats 
injected with maternal blood plasma, (2) rats in proestrus (prior to heat) 
that received plasma from rats in proestrus, (3) rats in diestus (heat) that 
received plasma from rats in diestrus, and (4) rats that were injected with a 
saline solution (placebo). We will consider the data as arising from four 
populations with respective cdfs F(x -9,), / = 1, . . . , 4, F G fio> and we 
wsh to test //q: 0, = 02 = 03 = 9^ versus : 9y,9^,9^,9^ not all equal. If 
the test rejects Hq, we then wish to determine which groups are significantly 
different. Data for this example is analyzed in Example 4.2.2. 
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The data can be visualized in a two-way array m which each column is a 
sample Hence there are k columns and the _/th column has n, observations 
from F(x — $ ), F e Rq The basic strategy is to rank the combined data set 
of size N compare the column rank sums or averages Let R,j 

denote the rank of X,j in the combined data and let 

and Rf~ R^/iij (4^1) 

If k 2, f? I IS the sum of ranks m the first sample in a two-sample 
problem Since /? , + •• NiN + l)/2, there is no additional information 

in the second sum of ranks Rj, and it could be dropped 

The same is true m the general k sample problem since 2*- 1^7 
N{N + l)/2 However, we w»H retain the entire set, J? |, , Kj, and 

take the dependence into account later in the equations for test statistics 
This means that we do not have to specify which rank sum is to be 
dropped 

Under the null hypothesis. R^ has the duinbutional ptoperties specified 
by Theorem 3 2 1 Hence Exerose 3 7 I in Chapter 3 provides the means, 
vanances, and covanances for the ranks in the Ar-sample case The needed 
results are given m the following theorem 

Theorem 4 J.1, Suppose the k samples come from a common distribution 
so the null hypothesis holds, and let Rj and R^ be given in (42 I) Then 

ER^ - n/iV + l)/2. ERj - (// + I)/2 
VaiRj = «,(// - nj){N + 1)/12. VarRj •‘{N- n^){N + l)/{12n^) 
Co\(R,,R,)^ -n,n^{N+l)/l2, Cov(«,.«^)= -.(fV+ I)/12 

Proof We treat Xij, , X^ as one sample and the rest of the data as a 
second sample Then we have immediately from Exercise 3 7 1 that ERj 
= + l)/2 and VarR = n/^N - 4- 1)/12 where rij and N - tij 

are the two sample sizes TTie Cov(R„R,) is determined as follows Let 
denote the sum of ranks of the combined ith and yth samples Then 
VarR„^^ =(n, + f?,)(A/- n, - 1)/12 But R^^■^^R, + Rj. so 

VaIR^.J^^V^T{R| + Rj) 

= Var/f , + VarR^ + 2Cov(R , ,R^) 
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Cov{R,,Rj) = ^ {Var2?.(^, - Vari?,- - VarRj }, 

and when the foregoing given formulas for the variances on the right 
side are substituted and the expressira simplified, we find Cov(R j,RJ) == 
- ttitijiN + I)/I2. The formulas for Rj follow immediately from those for 
R f by the properties of expectation. 

Now the difference Rj - (AT -t- l)/2 represents the departure from that 
expected under the null hypothesis. When the accumulated departures are 
too large we wish to reject Afo- ~ ‘ ‘ ‘ — ^k- This suggests a test statistic 
of the form: 




y=' 


^.y-(N + l)/2 


■jN) 7 


VarR 


(4.2.2) 


The weighting constants c^f ^, are chosen so that H is asymptoti- 
cally chi-squared with k - I degrees of freedom. The use of c^^ rather than 
Cjf/ will be notationally convenient later. At first thought, a natural choice 
for Cj^ would seem to be 1. Then H would be the sum_of squares 
standardized rank averages. However, as pointed out earlier, 2? ,, . . . , R 
are correlated and this will require some adjustment, which is achieved by 
proper choice of c,^, . . . , 

Following the same argument as in Example 3.2.1, since the N! 
/(«!•* ••• «fc!) sequences of combined sample observations are equally 
likely under the null hypothesis, the distribution of H, for anyTixed choice 
of . . . , Cjtyv, can be tabled. However, a different table is required for 
every k and every configuration of sample sizes. Hence, this approach is 
not very practical, and we turn to the asymptotic distribution as a workable 
alternative. The following theorem forms the basis for the asymptotic 
distribution theory needed in the one-way layout. 

Theorem 4.2.2. Suppose the k samples come from a common distri- 
bution. Suppose j = \, ... ,k in such a way that rij/N^Xj, 0 < 

A,<1, where = Suppose Cw^c, for j = , k. Define T' 

= (T ,, . . . ,T^) where 
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Then, T IS asymptotically A/K^(O.B) where 

^ if i~J 

Proof First, we relabel the observations « 1, , n^,j •» 1, ,k, as 

T,, , Yu, N = 2"i. where the first n^ F’s represent the first sample A’„, 

1=1. , n,, and so forth Let J represent the indices of the yth sample 

and J the complement, hence Y^, j^J, denotes the ^Ih sample and Y,, 
I € J, the remainder of the combined sample 

Now #(y, > yj + n^(«^+ i)/2. ueJ, oey Let if 

> y„ and 0 otherwise, then 

Under the null hypothesis. 

1 0 u*= I and v ^ i 

F(y)-l/2. v-l 

\/2-r(y) U-, 

Hence we have 

_|(«--V)[£0)-l/2] f<2J 

W 1/2 -£(,)] ,e7 

Since Rj—(N+ l)/2 = — ERj)/nj, the projection of Tj is given by 

(423) 

where a, = (A' — if i e / and a, =* — 1 if i eJ Further, from Exercise 
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ii 


(N - 


«/ 


+ (iV-n,) 


_ a N(N-n^) 

IItIjN J n'Kj 

Likewise, from Exercise 4.5.1, 

Cov(F,,F^)^-c,cyi2. 

Theorem AlO applies to show that Vj, j=\,...,k are asymptotically 
normally distributed and Theorem All applies to show that V' = 
(K|, . . . , V/^) is asymptotically multivariate normal with mean vector 0 and 
variance-covariance matrix B, given in the statement of the theorem. 

From Theorem 4.2.1 




1 (A-n,)(A-H) 


(12n,) 


-> cf 


12 \, 


By the Projection Theorem 2.5.2, E{Tj— 1^)^ = VarT^ — Var l^->0 for 
j - , k, and hence the difference in vectors T — V converges to zero 

in probability. This implies that T has the same limiting distribution as V 
and the proof is complete. 


Before discussing the choice of c^t^, . . . , for which H, given by 
(4.2.2), has an asymptotic chi-square distribution, we present a result from 
normal distribution theory. Recall from matrix theory that a matrix A is 
idempotent if A^ = A and that the rank of an idempotent matrix is the sum 
of the diagonal elements (the trace). 

Theorem 4.2.3. Suppose (Z,, . . . , Zj,)' has a MkW(0,A) distribution. 
Suppose that A is idempotent with rank r. Then has a chi-square 

distribution with r degrees of freedom, denoted 

Proof. There exists an orthogonal matrix G such that G'AG is a diago- 
nal matrix D with r ones and k — r zeros. Define U = G'Z where Z' 
= (Z,, . . . , Zyr), then U has a MF7V(0,D) distribution (Arnold, 1981, p. 
46). Now Stz2 = Z'Z = U'G'GU = U'U = 2^t/^ which is the sum of 
squares of r i.i.d. /2(0, 1) random variables. Thus has a x^(/-) distribu- 
tion. 
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The next theorem provides the correction values c,/,, . , c*;v so that//, 

(4 2 2), has an asymptotic chi square distnbution 


Theorem 4 24 Under the null hypothesis that the k samples come from a 
common distribution, 


■2 1 


.^V 

fll 


Rj-(N+t)/2 


V(«-”/)(W+l)/{I2n^) 


has an asymptotic " 1) distribution 

^roof Define, from the statement of Theorem 4 2 2, 

7?-VV(>-V/(l22,)- 

then It follows that T* ■ (T'l , 7J) has an asymptotic MKAf(0,B*) 
distribution with 

' l-Ww((i-M('-yl . ■'‘y 

Since rtj/N-*}iy, we have, approximately, from (242), 

I V(» -",)(«+ l)/('2",) ) 
lM'-M/('2y J 

y-> 
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Now, Theorem A2(b) implies that H converges in distribution to 
where (Z„ . . . , Z^)' has a MFA^(0,B*) distribution. Further, from Theo- 
rem 4.2.3, ^ chi-square distribution provided B* is idempo- 

tent. Hence, we seek the constants c,, . . . , such that (B*)^ = B*. 

By comparing (B*)^ to B* it is possible to anticipate the solution. We 
state the solution and verify that B* is idempotent. Let 


Hence b* = \ - \ if / = j and b* = — when i j. We can then 

write 

B* = I - 65' 

where 6' = . . . , and I is the A: X A: identity. Since 6 '6 

= 2tX, = 1, it is easy to check that B* is idempotent. The rank of 
B* = trace of B* = — — k — Hence, we 

have k — 1 degrees of freedom and the proof is complete. In Exercise 4.5.2 
you are asked to carry out the algebraic reduction to verify the other 
equations for H*. The third equation is the most practical. 


The statistic H* is called the Kruskal- Wallis (1952) statistic and rejects 
Hq:6i = 9^ at approximate level a when H* > y^{k — 1), where 

xi(A — 1) is the I — a percentile of the chi-square distribution with A: — 1 
degrees of freedom. 

When the Kruskal-Wallis test rejects Hq, we can construct pairwise 
multiple comparisons to locate the source of significance. There are k{k — 
l)/2 paiiwise comparisons, each based on the difference in the column 
average ranks. Let 




(4.2.4) 


Under the null hypothesis, ED,j = 0 and from Theorem 4.2.1, 
Var£>„ = [Var«^ + Var^, - 2Cov(^^ 


A^-l- 1 


12 


1 + 1 
n, n, J 


■ 


12 


1 + 1 


(4.2.5) 
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In Theorem 4 2 2 take Cj^ = l.y =1, ,k, then an application of Theo- 

rem A2{b) to the differences implies that is asymptotically n(0,o^) with 
given by (4 2 5) 

Let a denote a presenbed overall error rate for the experiment, and let 
a' *‘2a/[k{k - 1)] denote the pairwise companson error rate We will 
declare 9, and 0. significantly different at overall level a if 

|0,l > 

or 

- «,l > 2./;-^ ^ ^ 

The next theorem provides the interpretation of o as the overall error 
rate 

Theorem 4 2.5 Under the null hypothesis, the probabiliiy of committing 
at least one error with (4 26) tor 1 < i < j <. k, k bounded above by a, 
when S is large 

Proof Let be the event |f),l > Z.,/j(Var£»^)'^* From the asymptotic 
normality of D,^ we have, under the null hypothesis. P(E,j}^a' Hence the 
probability that D,j commits an error is approximately a’ The probability 
of at least one error is 

The inequality is known as Bonferrom's inequality 

The preceding multiple comparisons were first suggested by Dunn 
(1964) For other approaches see the discussion in Miller (1981) 

Example 4.2.2. Data from the experiment performed on 32 virgin rats 
(Example 4 2 1) is given in Table 4 1 The measurement is time until 
retrieving behavior is established The unit of time is the length of an 
observation session Hence, a value of 0 5 means the behavior began half 
way into the first observation session In ranking the data, we have assigned 
the average rank to tied observations 
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Table 4.1. Latency Times (Artificial Data) 



Maternal 

Proestrus 

Diestrus 



Plasma 

Plasma 

Plasma 

Saline 


0.5(2)“ 

1.1(6) 

0.4(1) 

0.9(4) 


0.7(3) 

1.6(8) 

1.9(10) 

2.1(11) 


1.0(5) 

3.7(18) 

2.4(13.5) 

3.0(16) 


1.2(7) 

4.3(20) 

2.8(15) 

4.7(21.5) 


1.7(9) 

4.7(21.5) 

3.9(19) 

6.4(25) 


2.3(12) 

5.6(24) 

5.4(23) 

6.6(26.5) 


2.4(13.5) 

6.6(26.5) 

11.4(31) 

8.5(28) 


3.1(17) 

8.8(29) 

20.4(32) 

10.0(30) 


68.5 

153 

144.5 

162 

ERj 

132 

132 

132 

132 


“Number m parentheses is the rank. 


To test //q : 0, = • • • = 04 versus Hy^ : 0,,02>^3»^4 not equal at a = 
.05, we reject Hq if H* > x where H* is the Kruskal-Wallis statistic, 
given in Theorem 4.2.4 and Xo5(3) = 7 . 81 . The data yields H* = 7 . 85 , and 
hence we reject Hq and conclude there are differences among the popula- 
tions. 

In Table 4.2, we have recorded the absolute differences \Rj — R J. Since 
the sample sizes are equal, (4.2.6) can be converted to rank sums and 
becomes 


\Rj-R, 


> Z, 


^nN{N+ 1 ) 


'«72 


It is clear from Table 4.2 that the significance arises from the differences 
between the maternal plasma group and all other groups. Further, there is 
not much difference among the others. If we take an overall error rate of 


Table 4.2. Absolute Differences of 
Rank Sums 



MP 

PP 

DP 

PP 

84.5 



DP 

76 

8.5 


S 

93.5 

9 

17.5 
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a “ 12, then a' 02, Z, ■■ 2 326 and we would use \Rj — J > 87 3 If 
a = 24, then \Rj — /?j! > 77 I Hence, to declare significant differences, we 
must assign a fairly large overall error rate This is not surprising in this 
example since //* is just barely significant at the 5% level, and we have 
small sample sizes on which to base the multiple compansons 

The idea of Pitman efficiency can be extended to tests whose statistics 
have asymptotic chi-square dislnbuuons A multivanate version of Theo- 
rem 2 6 1 IS needed We present only a heuristic development, a rigorous 
treatment can be found in Hajek and Sidak (1967, Chapter VI) Recall, 
from the discussion following Theorem 261, that the asymptotic distribu- 
tion of a test statistic, under a sequence of alternatives converging to the 
null hypothesis, changes only tn the mean Hence, when we know the 
asymptotic distribution under H^, we need to investigate the behavior of 
the mean of the test statistic under a sequence of alternatives 

In Theorem 4 2 4, we showed that the vector T*' “ (77. . T?). where 



IS asymptotically A/f'V(0,B*) We have replaced by nJN m the defini- 
tion of Tf The following theorem de$cnb« the behavior of £7J* under a 
sequence of alternatives The calculations are similar to those in Example 
257 

Theorem 42.6. Suppose the Pitman regulanty conditions in Section 2 6 
hold Let - (a + /3,/Af . a + Then, as 

N-*co, 


fory = 1. , k. where C/w-^c^ and 

Proof Initially suppose that 2^ has cdf Fix ~ ffj), i = l, .^•7*' 
I, ,k Since Ry " 2?-i^4f. where Ry IS the rank of A’j;, we have 

HR,- EX ly,* . 

• ¥’J ^ 
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where lV,j = #(X„i < X^j), m = 1, . . . , u = 1, . . . , Further, 

£fV^ = fi,.njP(X,, < X,y) 

= n,njjF(( - 0,.)f(t -Oj)dt 

Expanding EWfj as a function of (0, , 6j) about (0, 0) we have 

EWij^ n^tij I i -I- 0jj f\x) dx - 0, J f\x) dx | , 

where the error of the approximation is of small order in 0, and Oj. This 
yields 

= «,){ i +0jjf\x)dx'^ - nj'2nAjf{x)dx+ — 5- 

and 

Now substitute 9j = a + and take the limit as oo, then 



This completes the proof. 

Since we are interested in the Kruskal-Wallis statistic H*, let Cj^ = 
1{N- _^^.y /2 j^eij the extension of Theorem 2.6.1 states 
that T*' = (rf, . . . , r*) is asymptotically MVN{ii,B*), where = \ — A,, 
if ' =j, and = -(X,.X)'/^ if / ^ J, and ju = 12 '/^//^(x)</x A.V2(yg - 

Finally, recall that H* = TTiis statistic continues to have an 

asymptotic chi-square distribution. However, it is a noncentral chi-square 
distribution with k — 1 degrees of freedom. The noncentrality parameter is 
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found by substituting the means into II* and is given by 

V-i [£»?]’ 

>-* 

- I2 (/ /'(*>*)’ -H (“ 2 *) 

The magnitude of the asymptotic power, along the sequence of local 
alternatives is determined by the noncentraliiy parameter Since the ex 
pected value of a nonceniral chi square distribution with r degrees of 
freedom and noncentrality parameter 5, is r + S, larger values of 6 corre 
spond to larger values of asymptotic power See Andrews (1954) for further 
discussion This prompts the definition of asymptotic efficiency as the ratio 
of noncentrality parameters see Hannan (1956) 

Let (k ~ \)F denote (k I) times the usual one-way F test Under 
regularity conditions, discussed by Arnold (1981 Chapter 10) (fc- l)f 
has an asymptotic noncentral chi square disinbution when ~ 

a + p ^/ The noncentrality parameter is found by 
substituting the mean into the equation for /‘and is given by 

( 125 ) 

where is the variance of F and sec Arnold (1981, p 93) 

Hence we have 

e{ir.F) - «„./«, - 12o'(| (4 2 10) 

which IS the Pitman efficiency of the Wilcoxon signed rank test reallive to 
the one sample / test and the Pitman efficiency of the Mann Whitney- 
Wilcoxon lest relative to the two sample t test Thus the Kruskal-Wallis 
test shares the efficiency properties of the one- and two-sample rank tests 
It may well be that the omnibus alternative hypothesis of unequal 
locations is not appropriate for the experiment under consideration The 
researcher may wish to delect an increasing (or decreasing) experimental 
effect This is similar to the one sided alternative in the one and two- 
sample problems The Kruskal-Wallis test is not appropriate becaue it is 
designed to detect any departure from equal locations It is possible to 
tailor a test which has more power to detect an increasing alternative 
Suppose the data consists in k samples as described at the beginning of 
this section Suppose we wish to lest = = 9* versus 6, 
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< • • • < 0^ with at least one strict inequality. We construct a statistic 
which assesses the degree of agreement between the observed average 
ranks, Rj,j=\,...,k, and the hypothesized ordering. Let 

then large values of L support the alternative hypothesis. Under the null 
hypothesis, EL = 0 and 

see Exercise 4.5.3. When the sample sizes are all equal to n, VarL = (k^ — 
\){nk + l)/(144n). Theorem 4.2.2, with - (A: -I- l)/2, y = 1, . . . , A:, 

and Theorem A2(b) imply that L is asymptotically n(0,a^) with given by 
(4.2.12). Hence, to test Hq\Q^= • • • = 0^ versus : 0, < • • • < with 
at least one strict inequality, reject Hq at approximate level a if L 
> Z„(VarL)'/^ where Z„ is the upper a perceritile of the standard normal 
distribution. 

The problem has the flavor of regression. When the ordering specified 
by the alternative hypothesis is quantitative, rather than qualitative, the 
regression methods in the next chapter are appropriate. With only a 
qualitative ordering (ordinal scale) to work with, the test based on L can be 
quite useful. A significant loss of power can result when the ordered 
alternative is appropriate, but the Kruskal-Wallis test is used. 

Example 4.23. In the Stanford heart transplant study various quantitative 
and semiquantitative measurements were taken on the patients. One mea- 
sure, the mismatch score, indicates the degree to which the donor and the 
recipient are mismatched for tissue type. It could be hypothesized that 
survival time will tend to increase with lower mismatch scores. The survival 
times, presented by Mosteller and Tukey (1977, p. 571), are given in Table 
4.3. Mismatch scores are classified as low (0-1), medium (1-2), and high 
(2- ). If 0^, 0^^^ and 0„ denote the population median survival times 
corresponding to these three groups, then we wish to test //g : 
versus ^ l®^st one strict inequality. If we take 

a = 0.05, then we reject L < — Z„(VarL)'/^. Now n, = 14, n 2 = 13, 
"3=12, N = 39, VarL = .52, — Z„j=— L645 and we reject //g if L< 
- 1.19. For A' = 3, L = [1/(39)'/^] (R j — 7? ,) = —0.8, and we fail to reject 
T/g. Hence the data does not support, at a = 0.05, the hypothesis that 
survival time tends to increase with decreasing mismatch scores. There are 
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Table 4 J Survival Times 


Mismatch Category 

Low 

Medium 

High 

44 (11)- 

15 (5) 

3(2) 

551 (33) 

280 (30) 

136 (27) 

127 (26) 

1024 (38) 

65 (22 5) 

1 (1) 

253 (29) 

25(7) 

297 (31) 

66 (24) 

64 (21) 

46(12) 

29(9) 

322 (32) 

60(19) 

161 (28) 

23 (6) 

65 (22 5) 

624 (34) 

54 (18) 

12(4) 

39 (10) 

63 (20) 

1350 (39) 

51 (16 5) 

50 (15) 

730 (35) 

68 (25) 

10 (3) 

47 (13) 

836 (36) 

48 (14) 

994 (37) 

51 (165) 


26 (8) 



Rj 21 

23 

16 


'Number in pareniheset le the rank m the combined umple 


Other vanables, such as age or waiting time for the donor, or the general 
physical condition of the paiienL that may have a stronger impact on 
survival time 

Teipstra (1952) and Jonckheere (1954) independently proposed a test for 
the ordered alternative which is based on the pairwise Mann-Whitney- 
WiIcOAon statistics This approach has the attractive feature that the 
comparison of samples i and j does not depend on the rest of the combined 
data For testing //(, = = (1* versus with at 

least one stnct inequality, let 

(4 2 13 ) 

»<j 

where W,, = > X^) c «» 1, , = 1, , n,, discussed m Sec- 

tion 3 2 We reject for large values of J 

LetW =(W ',2 . . IFj*. , the A:(/t - l)/2- 

component vector of Mann-Whitney-Wilcoxon statistics and let N 
= 2i"/- combined sample size Suppose that 0<^<1 

Under Hq, the limiting multivanate normality of N - £W) follows 
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from the componentwise limiting normality of the projections, Theorem 
3.2.4, and the convergence of the variance-covariance matrix. Theorem 
A 13. In order to make the test operational we need the covariances. 

Under Hq, from (3.2.3), EW,j = «,n^/2, VarlTj^ = «,«,(«, + n, + 1)/12. 
Further, Cov(lFj,, = 0 when s,t are both different from u,v. Exercise 
4.5.4 provides the other covariances. 

In order to find the variance of J we define 


lV,= XfV,., (4.2.14) 

u=l 

so that J = Further, 

Cov( W,,W,) = Cov( 2 JV ^ , 2 

\u=l U=1 

= 2Cov(w;,,2V„,) 

= 2'cov(lT„,H^„,-hlT„). 

M=l 

But, from Exercise 4.5.4, 

Cov( ) = Cov( fV^ + Cov( ,fV„) = 0. 

Hence Cov(lF„ W,) = 0 and 

Jt 

Var/= 2 VarlT,. (4.2.15) 

1 = 2 

Now W, is the Mann-Whitney-Wilcoxon statistic computed on the /th 
sample versus the combined data in the first i — 1 samples. If we let 
where A, = n, and N^ = N, then, from (3.2.3), VarW^, = 
n,N,_,(A,-t- 1)/12. Hence 

Vary = ^ i n,N,_,{N, + 1). (4.2.16) 

lonckheere (1954) developed the cumulant generating function and from 
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this provided the following alternate equation for (42 16) 

Vary - i I «’(2Ar + 3) - X "JP", + 3)| (“t 2 17) 

From (4 2 13) 

£7-S2".'',/2 

•<J 

(4218) 

>-• 

where the second inequality follows from (2n,)^ ■■ S"* + 222/<;/"r”y 
The Jonckheere-Terpstra test rejects *» in favor of 

//^ <6^ with at least one strict inequality, at approximate level 

a, when J > EJ •¥ where EJ and Var/ are given by (42 18) 

and (4 2 16) or (4 2 17) and Z, is the upper a percentile from the standard 
normal distribution 

The property that IKj, , defined m (4 2 14) are uncorrelaled can 
be strengthened They are in fact, independent The independence can be 
used to develop an alternative argument for the asymptotic normality of J 
Exercise 45 5 describes a relationship between L (4 211), and sta- 
tistics based on pairwise Mann-Whitney Wilcoxon statistics Tryon and 
Hcltmanspcrger (1974) consider weighted linear combinations of Mann- 
Whitney-Wilcoxon statistics Other approaches can be found m Chacko 
(1963) and Johnson and Mehrotra (1971) The monograph by Barlow et al 
(1972) provides an excellent overview of the area of statistical inference 
under order restrictions 


43. THE TWO-WAY LAYOUT: THE FRIEDMAN TEST 

In the last section we considered the comparison of k samples to detect 
significant differences among the sampled populations In the randomiza- 
tion model, N subjects would be randomly assigned to k treatments 
Existing differences among the k treatments may be obscured by relatively 
large vanabhlity of subjects wrthm the samples Often this problem can be 
alleviated by dividing the subjects into more homogeneous subgroups or 
blocks Compansons are then earned out within the blocks 

We restnet attention to the complete randomized block design with one 
observation per cell Hence wc have N = nk subjects divided into n blocks 
and subjects are assigned to the k treatments at random 
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Repeated measures on subjects also provides an important example. In 
this application, a single subject forms a block and k measurements are 
made on the subject. The order in which the k measurements are taken is 
often randomized and we wish to detect a consistent pattern of measure- 
ment differences among the subjects. For example, n judges ranking k 
items could be analyzed using this design. 

The sampling model can be defined in two ways: (1) by X,j, / = 
1, . . . , = 1, . . . , ;c, with cdf F,(x - Bj), F, £ Aq, / = 1, . • - , n. Hence F, 

is the distribution of the observations in the /th block, and, within the /th 
block, 6j is the median corresponding to the yth treatment. All observations 
are independent. (2) by (A',,, . . . , AT,*) with joint cdf F,(x, - , X/^- 

/ = !,...,« where F,(x„ . . . , • • • > A) and (/„ ... ,7 a) 

is a permutation of (x,, . . . , x^). Hence the random variables X,^, . . . , 
are said to be exchangeable. This is the appropriate model for repeated 
measures where it is not appropriate to assume independence within a 
block. 

Example 4,3.1. Todd et al. (1980) attempt to isolate a formal mathemati- 
cal transformation that a human subject would perceive as descriptive of 
growth. The idea that a geometric transformation might be helpful in 
describing morphological change can be traced back to the work of D’Arcy 
Wentworth Thompson in the early 1900s. Most of the work had been of a 
qualitative nature until Todd et al. studied the effects of several specific 
mathematical transformations on infant facial profiles. They used five 
different transformations: cardioidal strain (CS), spiral strain (SS), affine 
shear (AS), reflected shear (RS), and rotation (R). The strain transforms 
tend to change circles into heart-shaped figures, and the shear transforms 
tend to change circles into diagonally oriented ellipses. The article contains 
illustrations. 

After preliminary research, the authors hypothesized that the effects of a 
cardioidal strain are perceptually equivalent to the morphological changes 
produced by normal growth of a human head. Subjects were shown 
different sequences of five facial profiles. The sequences were designed so 
that the perceived age would increase from left to right. One set of 
sequences consisted of actual growth (AG) profiles traced from X-rays. 
Profiles generated by the mathematical transformations all began with an 
actual growth profile. There was also another group of control (C) se- 
quences in which all five profiles were identical. Subjects were asked to rate 
each sequence from 0 to 4 on the basis of its resemblance to actual growth. 

We have a two-way layout with k = l “treatments”: AG, CS, SS, AS, 
RS, R, and C; and the subjects, producing repeated measurements, consti- 
tute the blocks. The data, presented in Example 4.3.2, hopefully will reject 
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the null hypothesis of no difference among these treatments and reveal an 
association ot the strain transtonnatKms CS and SS n-iih actual growth 

We wish to test //q 0, “ ■* 0, versus not all 9 s arc equal Let 

/Ij, be the rank of ATy among Jf j the observations m the ilh block 

Let Rj “ 2"-i^y ranlcs corresponding to the 7th treatment 

The ranks can be displayed m a two-way array as follows 

Treatments 


Blocks 1 2 k 



Under the null hypothesis K, are distnbuted according to the 

results in Theorem 3 2 1 even under the exchangeability model Hence 
ERj-(k+l)/2 Var^y-<**- 0/12 and 

£/e,-n(*+l)/2 

Var/Jy-o(ik»-l)/!2 (4 31) 

Cov(fl -«(*+ l)/12 

see Exercise 4 5 6 

Fnedman (1937) proposed a statistic of the form 


where the weighting constants are chosen so that K has an asymptotic 
X^(k - I) disinbution 

Let T = (r r*) where 
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Then the asymptotic distribution of T is AfFA^(0,B) where 

* = i=j 

\-c,Cj{k+\)/\2 i^j 


(4.3.4) 


and Cjf)-^Cj,j=\,...,k. The proof of this result is outlined in Exercise 
4.5.7. The result forms the basis for the asymptotic distribution theory 
needed in the two-way layout, similar to the one-way layout. 

Further, Exercise 4.5.7 shows that Friedman’s statistic. 


K* = 


(i-iv 

Rj — n(k + l)/2 

1 k) 

_^ln(k^ -\)/l2 . 


12 

nk{k + 1) 


2[l?,-n(^+l)/2f 

7=1 


12 

nk{k + 1) 



— 3n(k + 1), 


(4.3.5) 


rejects the null hypothesis Hq: 9^ — ■ ■ ■ = 9^ at approximate level a, if 
^ - 1), where x^(k - 1) is the upper a percentile for a chi-square 

distribution with k — \ degrees of freedom. 

When the Friedman test rejects we can construct pairwise multiple 
comparisons based on the rank sums: „ . . . ,Rj^. From Exercise 4.5.7 
and Theorem A2(b), (Rj — i? ,)/[Var(i?^ — has an asymptotic n(0, 

1) distribution under Hq. Using (4.3.1) it is easy to see that Var(/? . - 7?,) 
= nk{k+ l)/6. Hence, similar to (4.2.6) in the one-way layout, declare 
and 9j significantly different, at overall level a, if ' 


l-R.y - ^.,1 > Z„./2{nk{k+ l)/6 , (4.3.6) 

where a = 2a/k{k — 1) and 1 - ^{Z„y 2 ) = a'/2. By the same argument 
as in the proof of Theorem 4.2.5, the probability of committing at least one 
error, under H^, is bounded above by a. 

Example 43.2. This example continues Example 4.3.1. Five subjects are 
presented with “growth” sequences. They score each sequence from 0 to 4 
with zero representing no perception of growth. The different sequences are 
presented in random order and each subject sees 5 sequences for each of 
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Tabic 4 4 Ratings of “Grnwih’' Sequences (Artificial Data) 


Subject 

AG 

CS 

SS 

AS 

RS 

R 

C 

1 

3 9{7y 

3 5(6) 

2 8(5) 

15(4) 

0 5(2) 

06(3) 

02(1) 

2 

3 4 (6 5) 

3 4 (6 5) 

2 5(5) 

10(4) 

0 8(3) 

0 1(1) 

0 2(2) 

3 

3 8(7) 

30(5) 

31(6) 

09(4) 

0 6(3) 

02(1) 

04(2) 

4 

3 2(6) 

3 4(7) 

30(5) 

12(4) 

0 4(3) 

02(1) 

0 3(2) 

5 

3 7(7) 

3 2(6) 

2 7(5) 

10(4) 

0 2(2) 

0 3(3) 

01(1) 


33 5 

305 

26 

20 

13 

9 

8 


'Number in parentheses is rank withm the cow 


the 7 types In Table 4 4 the mean of ihc 5 scores for each type of sequence 
IS reported along with its rank among the 7 types To test //« 6, =» 

■■ versus not all equal at a = 05. we reject //p if K* > xi(6)-12 6 
In (4 3 5) using n ■ 5 and />. ■ 7 we have K* ’=21 5 and hence we reject 
//p and claim at approximately a > OS. that there is a difference among 
the treatments 

There are k(k - l)/2 ■> 21 pairwise comparisons If we take the overall 
error rate a ~ 21 then the comparison error rate a « 2a/k{k -* 1) ■ 01 
and Za /j - 2 576 From (4 3 6) a pair will be declared significantly differ- 
ent if 1/?^ - I > 17 6 This simple analysis shows that AG and CS are 
significantly far from RS R, and C and supports the hypothesis that the 
cardioidal strain is perceived as growth while eliminating reflected shear 
and rotation 

Exercise 4 5 8 desenbes the behavior of K* for a sequence of alternatives 
that converges to the null hypothesis From Arnold {1981 p 87) {k — 1)F 
which IS (A - I) times the usual F statistic for testing ffp is asymptotically 
noncentral chi-square with k — I degrees of freedom and noncenlrality 
parameter 

We have assumed the same model as m Exercise 4 5 8 

Hence, as in the discussion of (4 2 10) the efficiency of K* relative to F 
IS 

- (4 3 8) 
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The striking point is that K* does not inherit the efficiency of the Wilcoxon 
one- and two-sample tests relative to the t tests, as was the case with the 
Kruskal-Wallis test, H*. When the underlying distribution is normal, 

e{K*,F) = 'ik/[{k + l)7r]. (4.3.9) 


Hence, when k = 2, e(K*,F) = IJ-n = 0.64, the efficiency of the sign test 
relative to the t test. See Exercise 4.5.9. This reflects the loss of information 
incurred by ranking within blocks, especially for a small number of 
treatments. This efficiency loss vanishes as k increases, and e(A'*,F)->3 / n 
= 0.955 as A:->oo. Following are some values of {k,e{K*,F)): (2,0.64), 
(3,0.72), (4,0.76), (5,0.80), (10,0.87), (20, 0.91). 

Methods, to take the interblock information into account, in the general 
linear model setting, are presented in the next chapter. In the randomized 
block design, the lost efficiency can be recovered. These methods, which 
generally involve constructing a rank statistic in the residuals after the 
block effect has been removed, are more complicated and less easy to apply 
than Friedman’s test. The Friedman test, despite its lower efficiency for 
small k, is a versatile technique, useful for both the randomized block 
design and the repeated measures design. 

Friedman’s test can be extended to the case of several observations per 
cell. The general case of unequal cell-sample sizes is treated by Bernard and 
van Elteren (1953). When the cells each have m observations, the statistic 
with its asymptotic distribution, under Hq, is given in Exercise 4.5.10. 

As in the case of a one-way layout, the omnibus alternative may not be 
appropriate. We may wish to test Hq:9^ = • • ■ =0^ versus H^:6j < ■ • • 
< 0^ with at least one strict inequality. The test statistic, in the two-way 
layout with one observation per cell, proposed by Page (1963), is 






(4.3.10) 


similar to (4.2.11). Under Hq, from Exercise 4.5.11, EQ = 0 and 


Nax Q = k\k^-\)(k + \)/\AA. (4.3.11) 

Further Q/(Var g)'/^ is asymptotically n(0, 1). Hence Page’s test rejects 
Ho, at approximate level a, if Q > Z„(Var e)'/^ where 1 - 4>(ZJ = a. 

An analog of the Jonckheere-Terpstra test J, (4.2.13), in the two-way 
layout consists in computing J for each block and then combining the 
statistics across the blocks. Skillings and Wolfe (1978) consider this statistic. 
Their statistic allows for unequal numbers of observations per cell and for 



200 Tinj OME- AND TWO-WAY LAYOITTS AND RANK CORRELATION 

differential weighting of the different J statistics Let J, denote J computed 
on the jth block and 


(43 12 ) 

I 

Then under the null hypothesis the mean, vanance. and asymptotic normal- 
ity of J* are given in Zeroise 4^ 12 

Another analog of the Jonckhcere-Teipstra test, (4 2 !3). in the two-way 
layout would be the statistic A where is the Wilcoxon 

signed rank statistic computed on the ith and yth paired samples These 
statistics have been studied by Hollander (1967) and Pun and Sen (1968) It 
might be hoped that the A test docs not suffer the efficiency loss (similar to 
the Fnedman test) for small k since interblock information is taken into 
account Page's test relies sinctly on ranking within blocks 

Pine (1974) made an extensive comparison of tests based on A and Q 
Pine considers asymptotic effiaency for both k fixed, /i -4 eo, and n fixed, 
eo fie shows that superior test performance depends on the underlying 
distribution, and the values of k and n Hence A is not necessanly more 
efficient than Q In addition, the statistic A is not distnbution free under 
//g and is not easy to implement For these reasons we recommend the test 
based on Q and we do not present here any of the details concerning the 
test based on A 

One final variation, the balanced incompleie block design, is discussed 
mExcfcisc45 13 Durbin's(195l) test, with its asymptotic null distribution 
IS presented there. The efficiency of Durbin’s test relative to the F test was 
computed by van Elteren and Noether (1959) When there are i treatments 
and there ire k < t treatments ranked within each block, the efficiency is 
identical to e(K*,F), (4 3 8) Hence, for example, in paired compansons in 
which n judges compare I objects, pairwise, we have k**2, and the 
efficiency is once again that of the sign test relative to the t test 


4.4. RANK CORRELATION AND ASSOCIATION 

In this section we introduce the ideas of rank correlation and association as 
measures of agreement between two sets of rankings We consider, in detail, 
a bivanate model m which the data is ranked separately within each 
component The distribution theo^r will also cover the case m which the 
original data is a set of ranks This case commonly arises when judges are 
asked to express their preferences by assigning ranks directly to a set of 
objects 
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We also interpret the various tests proposed in the earlier sections of this 
chapter in the light of how they assess the degree of agreement between the 
observed rank sums and the alternative hypotheses. This interpretation 
provides additional insight into how the tests work and illustrates the 
connection between correlation and the sums of squares. Fisher (1970) used 
this connection to great advantage in developing the analysis of variance 
out of previous correlation approaches. 

The sampling model consists in a sample (A',, T,), . . . , (X„, Y„) from 
F(x, y) where F(-, •) is absolutely continuous with absolutely continuous 
marginal cdfs F^(-) and Fy(-). We think of the data arranged in two rows: 


. . . ^„ 

y,y2...y„ 


(4.4.1) 


and, without loss of generality, we suppose that A', < • • ■ < X„. Let 
. . . , be the corresponding ranks of y,, . . . , Y„, then we have the 
corresponding rank array: 


1 2 

i?, 


n 


(4.4.2) 


We now present the two major methods for assessing the degree of 
agreement between the two sets of rankings in (4.2.2). See Kruskal (1958) 
for a historical overview of measures of association. Spearman’s (1904) 
measure is simply the product-moment correlation coefficient computed on 
the ranks: 




(4.4.3) 


Since the ranks are a rearrangement of the integers from 1 to n, the 
denominator is — {n + \)/Tf = — \)/\2 from Theorem All. 

Further, since 2[/ - (n + l)/2] = 0, 


r, = 


12 


n(n^ - 1) ,=i 

6 


2[/-(« + l)/2]F, 


= 1 - 




1=1 


(4.4.4) 
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The last equality is the most convenient computationally, see Exercise 
4S 14 Since r, is a correlation coefficient, it has the usual property that 
- I < r, < I It IS easy to check that the extremes are attainable If we have 
independence, so that the two rankings are independent, the joint distribu- 
tion of /J|, , /!„ IS uniform on the n' permutations Hence, if there is no 

association, Er, = 0, from (4 4 4) Thus, r, should be between — 1 and + 1, 
and around 0 in the case of independence 

The second measure of association was introduced by Kendall (1938) 
We say that the pairs (X„ K,) and (X^. Yj) arc concordant if A", > ^ and 
Y, > Yj or if X, < X, and Y, < Yj If sgn(jr) - 1. 0. - I as x > 0, 0, < 0. 
then the pairs arc concordant if sgofATy — Ar^)sgn( Y^ — Y,)= \ In a similar 
fashion we call the pairs discordant if sgn(^^ — A'Jsgnf Y.— •= — 1 Let 

P and Q denote the number of concordant and discordant pairs, respec 
lively Then the excess of concordance over discordance is 

S^^P-Q 

•<J 

The possible values of S range from -n(n- l)/2 to /j(n - l)/2 For 
example, max 5 ~ n{n - l)/2 occurs when there is perfect agreement m the 
order of A",, , X, and K„ , F,. that is, perfect agreement m their 

rankings Kendall (1938) suggested the coefficient 

S 

max 5 

^P-Q) 

n(n-l) 

(4-16) 


Since E-f Q=n(n- l)/2 

Note that the Y ordering can be transformed into the X ordenng by 
successively interchanging neighboring pairs of Y values Then Q is the 
needed number of interchanges or inversions that will bnng the F’s into the 
same order as the A"s Hence Q (or t) can be thought of as measuring the 
disarray of the F's relative to the Af's 

It IS easy to check that — 1 < t < 1, and the extremes are attainable 
Further, if we have independence of X and Y, then E sgn( Yj— T,) = 0 and 
Et « 0, similar to r. 
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Note that since A', < • • • <X„,Q = “ '^j) = ~ 

Rj) where j(x) =1 if jc > 0 and 0 otherwise. This shows that Q can be 
computed either from the raw data or the ranks. A simple modification of 
Q can be developed by weighting the inversions proportional to the 
distance apart of the ranks. Hence, for example, j(l — 2) will get less weight 
in the measure of disarray than s(l — 5). If we take j — i for the weight of 
s{R, — Rj), then the new measure is 

><j 


In the next theorem we show that Spearman’s is a function of Q*. This 
will offer some insight into the relationship between and t. 

Theorem 4.4.1. 

‘ ) '<J 

T = 1 - - 4 - Rj)- 

Proof. We have already pointed out the equation for t in (4.4.6). From 
(4.4.4), it is sufficient to show Q* = {'£{i — R^} fl. 

Note that by interchanging the /,y notation we can write ~ 

R) = '211,<j'^(.Rj - R,)- Now, 


Q* = SS P{R. - ‘^R. - Rj ) 

Kj l<J 


+ 1.2 MR. -Rj)- 22P{R. - Rj) 

J<‘ J<l 


= 22 MR. -Rj)- 22n^{R. -Rj) + ^{Rj -R.)] 

'i/ .<j 

n n n— I n 

= 2y2*W-«,)-2' 2 I 

y=l '=! J=I y=i+I 

= 2 - Rj) - 2 ~ 0 - 

y=' ;=1 
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The last equality follows since > ^;) » " 1 

n which IS n - Replace /i - 1 by n in the upper limit of the second sum 
and combine the sums to get 


Next note that 


I I 1 


~2Q* 

and the formula for r, follows from (4 4 4) 

The theorem shows that, except in the extreme cases r, and r will not 
generally be equal Since r, gives greater weight to inversions of ranks that 
are farther part generally r, is larger in absolute value than r This means 
thitr, seems (o md^care (here (s more agreement (or drsagreemenfj between 
rankings than r but this difference is an artifact of their construction 
Kendall and Sluan(i973) point out that wthenX and Y ate independent, r, 
and f are highly correlated In fact iheir correlation declines from I at 
R > 2 to 98 at fl 5 and (ends to I as n increases Hence for testing 
independence they are asymptotically equivalent under the null hypothesis 

The major reference on the properties of r, and t is Kendall (1970) In 
his book on rank correlation Kendall claims that from many practical and 
most theoretical points of view r is preferable to r, see Kendall (1970 
Section I 24) As an indication of (he theoretical difficulties encountered by 
r, consider the population characteristics estimated by t and r. Under the 
sampling model (2^1 Y\) (2r„ T,) iid F{x y) introduced at the 

beginning of this section and from (4 4 5) and (4 4 6) 


-£(sEn[(X,-^,)(l',-l',)]) 

- p {(X, - jr,)( r, - r.) > 0) - f {(;r, - ;f,)( y. - r,) < "I 
-l-2/>{(A-,-Jf,)(y.- y,)<0) (447) 
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Thus, the parameter of interest is the probability of a discordance. Clearly, 
when X and Y are independent, the probability of discordance is 1/2 and 
£t = 0. The situation for is much more complicated and there is no 
simple population characteristic for r,. Kendall (1970, Chapter 9) shows 

where y = i’KA'j - - y,)>0], which is called a type 2 concor- 

dance. For large n, Er^ = 6(y — 1 /2) which is not so easy to interpret as the 
probability of a simple concordance. As a result, interpretation of is 
mostly confined to the observation that is a correlation coefficient 
computed on the two rankings. However, as will be seen later in this 
section, this property makes useful in the motivation and interpretation 
of the rank tests in the one- and two-way layouts. 

Under the null hypotehsis that X and Y are independent, the Y ranks 
are uniformly distributed over the first n integers. The results in Exercise 
4.5.15 show that Er^ = 0, Varr^ = l/(« — 1) and (n — is asymptoti- 
cally /i(0, 1). Hence it is simple to construct an hypothesis test based on r,. 

The asymptotic normality of t, under the null hypothesis, is outlined in 
Exercise 2.10.35(b). You are asked to construct the projection and argue the 
limiting normality. The Varr = 2(2n + 5)/[9n(/j — 1)] and it is pointed out 
that It takes a rather tedious counting argument to derive it. In the next 
theorem, due to Jirina (1976), we present a very clever argument which 
yields the Varr and the asymptotic normality. The argument rests on the 
development of a recursion formula for the distribution of t, similar to that 
of the Mann-Whitney-Wilcoxon statistic in Theorem 3.2.3, and the recog- 
nition of it as a convolution. 

Theorem 4.4.2. Suppose X and Y are independent. Let P{S — s)— p„(s) 
where S in (4.4.5) is based on a sample of size n. Then 

= P.- iC-y -2J+n+\) 

” ^=1 

for n> 3 and s = — n{n — l)/2, . . . , n{n — l)/2, and p 2 {s) =\/l, s = 

1,1. 

Further, ES = 0, VarS" = (« - l)«(2n + 5)/18, and S'/(VarS')'/2 is as- 
ymptotically n(0, 1). 

Proof. Suppose Y, < • • • < so that we need only consider i?,, 
• • • ,R„, the ranks of 7,, . . . , y„. Letp„(5) be the number of the permuta- 
tions of 1, . . . , n such that S = s. These permutations can be built from 
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I, , n — 1 by inserting the integer « Hence 


+ 0)+A-i(^“ l + (n-2))+*- 

+ft. i(j -(" -2) +l) + ^,-i(i -(«“!)) 

Recall 5 ■» R - 0 and consider + (n - 1)) This term arises from 
rt./?,, , R,_, The B in the first position adds 0 to P and n — I to g 

Hence S computed on /J|, , R,_| is reduced by b — 1 when we include 

n at the beginning In the second term we consider R,.n.Rj, . . , 
Hence P is increased by 1 and ^ by n — 2 
We can now wnte 

PM- 2 A-i[* + (”-^)-(y-')) 

•t/- i[»-2y + ''+ I] 

Since the n' permutations are equally likely under independence, 

f.(>) - ^ p 4>) 

- ; 2 />.-i[«-2y + »+ I] 

It IS obvious that = 1/2, a = — 1. 1 

Next, let ^ = 2y — n — 1 and rcwnle as 

P.W-2>.-i('-*); 

k " 

The ' on the summation indicates that Jfc takes the values It” 

-n + 3. ,B-3,n-l Define 7 .(*)= l/n if * - -n +1. -n + 3. 

,n - 3,n - 1, and 0 otherwise Then 

PM-'Zp.-i^’- l‘)q.{k) 

k 

arid p„<t) IS the CQn.voliit.v3R of the two dvsetete mass fowetKyw^ p, _ j awd q, , 
see Definition A3 in the Appendix We write p„ = ^n*Pm-\ Repeating this 
argument shows 




?1*R2 
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But />2 = 92> hence p„ = q„* ' ‘ ^his means that S has the same 

distribution as S2Z,, where Z^,, . . . , Z„ are independent and P{Z, = k) 
= l/i for A: = - / + 1, - / + 3, . . . , - 3,1 - 1. The vanance is the sum of 

the variances: 

n 

Var5= 2 V^rZ, 

1=2 


= S }('"-!) 

1=2 ^ 

= (n — 1)/7(2 /i + 5)/18. 

Note that EZ, = 0 so VarZ, = EZ^ = (/^ — l)/3. Since Z], . . . , Z„ are 
independent but not identically distributed we apply Theorem A6 to 
establish the asymptotic normality of S/(Var In the notation of 

Theorem A6, = ^l^^arZ, — n^. Since |Z,| < / — 1, 

£[Z,^/(|Z,| > < (/ - \fP{\Z,\ > eB„) 

<(/-1)^(£Z,2)AX'- 


Hence, the Lindeberg condition becomes 

■ii£[zA(|Z.l>e5„)]< 


1 


^b: y 3 

’ n^->0, 




since by Theorem A21, = n{n + l)(2n + l)(3/j^ + 3/i + l)/30 — n^. 

This completes the proof. 


We now present an example to compare and t, and to illustrate the 
various calculations. 


Example 4.4.1. From the World Almanac 1982, we list the Olympic times, 
in seconds, of the men’s 400-meter dash, 1500-meter run, and marathon 
(Table 4.5). Ties are assigned the average rank. If there are extensive ties in 
a data set, the reader should consult Kendall (1970, Chapter 3) since the 
various formulas for (and for t) are no longer computationally equiva- 
lent. We have two ties in the 400-meter data, in 1932 and 1948, but they do 
not effect the computations very much. In Table 4.6 we display r^, (4.4.3), 
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Tab(e 4^ Olympic Times In Second* «1tii Ranks 


Year 

1896 

1900 

1904 

1906 

1908 

400m 

ISOOm 

Marathon* 

54 2 (20) 
373 2 (20) 
3530(18) 

494 (16) 
2460(18) 
3585 (19) 

492 (15) 
245 4(17) 
5333 (20) 

53 2 (19) 
2520(19) 
3084 (16) 

500 (18) 
243 4(16) 
3318 (17) 

Year 

1912 

1920 

1924 

1928 

1932 

400 m 
ISOOm 
hfarathon* 

482(14) 
236 8 (14) 
221S (14) 

49 6 (17) 
24)8(15) 
1956 (II) 

476 (12) 
233 6(13) 
2483 (15) 

47 8 (13) 
2332(12) 
1977 (12) 

462 (8 5) 
2312(11) 
1896 (10) 

Year 

1936 

1948 

1952 

1956 

I960 

400m 

ISOOm 

Marathon* 

46S(I0) 
227 8 (9) 
1759 (9) 

46 2 (8 5) 
229 8 (10) 
2092 (13) 

459 (7) 
2252(8) 
1383 (7) 

467 (11) 
2212(7) 
1500 (8) 

44 9 (5) 
2156(2) 
916 (5) 

Year 

1964 

1968 

1972 

1976 

1980 

400m 

ISOOm 

Marathon* 

45 1 (6) 
2181 (4) 
731 (3) 

43 8 (I) 

2)49 (I) 
1226 (6) 

447 (4) 
2163(3) 
740 (4) 

442 (2) 
219 2(6) 
595 (1) 

44 6 (3) 

218 4(5) 
663 (2) 


'Actual manthon limes are 2 hours * enuy 


and T from (44 5) The numbers in parentheses are ii/(Varr,)’^* and 
T/(Varf)'-'^ The table illustrates the tendency of r, to be numerically more 
extreme than r However in terms of standard deviations, r is further from 
0 than r, 

Kendall (1970 Chapter 8) has shown that r (unlike r,) can be extended 
to the case of partial correlation Kendall points out that it is remarkable 
(but apparently only a coincidence) that the partial r has the same 
structural form as the partial product moment correlation Hence he shows 




rrr - 





) 

partial rank-correlation between X and Y 

with Z held fixed In 

Table 4 6. 

Rank Correlations 





Event 


Test 

400 1500 m 

1500 Marathon 

400 Marathon 

r. 

940(4 10) 

905(3 95) 

878(3 83) 


800(4 94) 

695(429) 

695(4 29) 



4.4. RA'NK CORRELATION AND ASSOCIATION 


209 


the foregoing example, t, 500, a / = -695. Further, it is not surprising that 
Olympic times decrease with the year. Computations from Table 4.5 show 
T,500.ycar= '^A/.year^ --832. Then wc have T.soo.Ar- year = -009. Hence, once 
we account for the trend with time, there is not much association left 
between the 1500-meter run and the marathon. At present there are no tests 
for the significance of the partial t. 


We now turn to a discussion of the relationship between rank correlation 
and the tests in the one- and two-way layouts. Some of the connections are 
obvious. For example. Page’s test, (4.3.10), for an ordered alternative in the 
two-way layout can be written as follows 


Q = 



k+ 1 
2 



n(k + 1) \ 


(4A.0) 

The expression in the braces is the numerator of Spearman’s computed 
between the /th row and the hypothesized ordering. From (4.4.3), multiply- 
ing and dividing by 2*[y - (^ + l)/2f = k(k^ — 1)/12, we have 


0 = 



n 


S'-, 


(4.4.11) 


where r, is Spearman’s between the /th row and the hypothesized 
ordering. Hence Q assesses the degree of agreement among the rows (or 
blocks) with respect to the hypothesized ordering. 

If there is no specified ordering with which to correlate the rows, we can 
consider the average rank correlation among all n(n — l)/2 pairs of rows in 
a two-way layout. The average rank correlation then measures the degree of 
agreement among the rows, but does not specify what they should agree 
upon. We write the average Spearman correlation as 


r,.,. = 


2 


n(«-l)S2 


[R„-(k+l)/2][R,,-(k+l)/2] ) 
k(k^-l)/l2 j 


12 

n(fj - l)k(k^ - 1) 


^ 2^ { 2^2 - (* + -{k+ l)/2] J. (4.4.12) 
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Note that 

+ S2[*« - (‘ + l)/2][«;, - (‘ + l)/2] 

(4 413) 

Then, since 1)/12 and the left side of 

(44 13) IS |/?, - /i(A: + 1)/2p, (44 12) becomes 

{ I ( - »/2]= - - l)/.2) 

(4414) 

This equation makes much more practical since, rather than compute 
n(/i - l)/2 correlations, we only need the k column rank sums We can 
also see the connection with Freidman’s K*, (4 3 5) We have 

(«- 1 )(*- 1 ) ')) 


and. conversely, 

AC = (n-|)(*-|)r,^.f(fc-l) (4415) 

Hence not only is easy to compute, but Fnedman’s K* is a linear 
function of r,,. This means that AT* is a measure of the amount of 
agreement among the rows (or blocks) When there is a high degree of 
agreement, K* will be large and reject the null hypothesis of no treatment 
effect 

This same two-way layout anses when n judges arc asked to rank k 
objects Kendall (1970) calls this the problem of n-rankmgs Then is a 
measure of the concordance or agreement among the judges Kendall 
introduces a coefficient of concordance 

where Rj is the sum of ranks for the ylh object In Exercise 4 5 16 you are 
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W = 







1 

n(k-l) 


K*. 


If the null hypothesis of no concordance among the judges is interpreted 
to mean that the judges act, as a group, as if they are assigning ranks at 
random, then we can reject the null hypothesis, at approximate level a, if 
W > [x^(/c - l)]/(n(k - 1)), where x^(/: - 1) is the chi-square critical 
value with k — \ degrees of freedom. In fact, W, and K* are all 
linearly related, so significance tests are identical for them all and may as 
well be carried out using K*, which is directly comparable to the chi-square 
critical value. 

Schucany and Frawley (1973) extend the measure of concordance to the 
problem of two-group concordance. Suppose two groups of judges are 
asked to rank k objects. We might wish to know if there is agreement 
within the two groups and agreement between the two groups. 

Let / = 1, . . . , w; y = 1, . . . , k, be the ranks assigned by m judges 
in Group 1 and jRJ, / = 1, . . . , n;y = 1, . . . , k, be the ranks assigned by n 
judges in Group 2. Let Rj and /?', y = 1, . . . , k be the respective rank 
sums for each object. The Schucany-Frawley statistic is 

k 

L*=^RjR'j. (4.4.17) 

y=' 

Under the null hypothesis that all rankings are uniformly distributed, the 
mean and variance are given in Exercise 4.5.17, along with maxL* and 
minL*. 

A generalized coefficient of two-group concordance, which ranges from 
~ 1 to + 1, is defined by 


= L* - EL* 
maxL* — EL* 


(4.4.18) 


The coefficient W* simply centers and rescales L* to the closed interval 
[- 1, + 1]. The intuitive appeal of W* (or L*) is provided in Exercise 4.5.18-, 
you are asked to show that 


W* = 


1 

mn 


2 

7=1/=I 


(4.4.19) 
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where r,j is Spearman’s r, computed on the /th judge m Group 1 and the jth 
judge in Group 2 The coclfiacnt IK* has the following interpretation 
large values approaching + 1 mean high agreement within both groups and 
high agreement across the groups, small values approaching - 1 mean high 
agreement within both groups and strong disagreement across the groups, 
and values around 0 indicate either disagreement within the groups or 
agreement within the groups but no association across the groups 

The hypothesis testing problem is a bit unusual since rejection of the null 
hypothesis that all row permutations are equally likely entails two very 
different decisions The limiting distribution is given in the next theorem 
and IS also unusual For fixed k, ihe hmiiing distnbution, as m,n->co, is 
not normal However, for moderate values of k, a normal approximation 
will work See Li and Schucany (1975) for a discussion 

Theorem 4.4J Under the null hypothesis that all row permutations are 
equally likely, for fixed k and as m.n oo, 

L* — EL* ^ I y W 

NirU JiT^ /-I ' ' 

where K|, , are independent /i(0. 1) random van- 

ables 

Proof First note, by (4 3 3) with e^f, ■ 1, 

(L* - EL-) - -^(S - 3m{k * l)/2)'-i (U - J»(* + l)/2) 
mn Vm Vrt 

where. S =(/!,. ,Rj}, U' = {7?',. , R\), J »{1. ,1), ERj = 

m{k ■¥ l)/2, and ER'j ■* n{k + l)/2 The result m Exercise 45 7, along 
with Theorem A2, implies that 

-!=(f-Ef)Sz;z, 

ymn 


where Z, is MWIO.B) i = 1.2, and B is defined by (43 4) with fy “ 1 The 
vectors Z, and Zj arc independent Further, from (4 3 4), 


‘ 12 1 


Now, for 1 = 1,2, let the vector Y,= {12/\fe(fc + DD'^^Z, Then is 
distnbuted as MVNifi.l — (l/A:)JJ) and the covariance matnx is idempo- 
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tent with rank k—\. Hence there exists an orthogonal matrix F such that 
U, = r'Y,, /= 1,2, are distributed as AfFjV(0,D) where D is a diagonal 
matrix with A: — 1 ones and 1 zero on the diagonal. This means that the 
components of U, and Uj are A - 1 i.i.d. n(0, 1) random variables and one 
0 with probability one. 

We now apply these tranformations to rewrite Z',Z 2 as 


z;z2 = 


A(A+ I) 
12 


Y',Y2 


A(A+ 1) 
12 


U'lFTUz 


A(A + 1) > 


12 


y = i 


with F,, . . . , F^_„ IF,, . . . , 1F^_, given in the statement of the theorem. 
From Exercise 4.5.17, VarL* = mn{k — \)k\k + 1)^144. Hence 


1 


;t-i 






(A - \)k\k + if 


{L*- EL*) 




E KW, 


144 

and the proof is complete. 


In the next theorem we construct the moment-generating function of 
IF, and discuss the limiting distribution. 

Theorem 4.4.4. Suppose F„ . . . , F^_„ IF, ^k-x «(0. !)• 

Then the moment-generating function of = V'W is given by 

M(/) = (l - 


Proof. Let V' = (F„ . . . , F,_,) and W' = (IF, IF^^-i)- We use a 

conditional expectation to find 

M{t) = 

= £{£(e'''"'|V = v)). 

But, the conditional expectation is the moment-generating function for W 
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which IS distributed as I^cncc 

since V V IS ^{k — 1) This completes the proof 


Now note that 


L* - EL* O \ y^y 

/VarX* 


Hence the moment-generating function of the limiting distribution is 


Af(0" 


2(* - l)/2 ) 




The approximation is valid as * oo. and e is ihe moment-generating 
function for /i(0. 1) Hence t*. when sundardized, is approximately nor- 
mally distributed only for large m. n. and k Li and Schucany (1972) 
suggest that k>6 provides an adequate normal approximation for large m 
and n Oddly, if m or n is small, then k need not be so large 

Finally as an example of a nonnormal limiting distribution, we consider 
what happens if * •» 3 In this case Af(r) ■ 1/(1 - »*) But this is the 
moment-generating function for the double exponential distribution, /(x) 
•= e for - 00 < X < CO Thus, for * -• 3 and m, n -♦ oo. 


/Varl* 


Z 


where Z has a double exponential distribution 

Hollander and Sethuraman (1978) argue that Schucany and Frawley 
(1973) do not consider the correct null hypothesis They reformulate the 
problem and propose a conditionally distnbution-free lest Their paper also 
contains comments by Schucany 

We have shown that tests of Fnedman and Page in the two-way layout 
arc intimately connected to Spearman’s rank correlation We now turn to 
the one-way layout and explore the connection between rank correlation 
and the Kruskal-Wallis statistic We must replace the familiar product- 
moment correlation coefficient with the intraclass correlation coefficient, 
see Fisher (1970, Chapter 7) The need for the intraclass correlation 
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coefficient arises when there is no natural way to order X and Y in the pair 
(A', Y). For example, if we wish to measure the correlation in the IQ of 
twins, there is no way to say which twin should be listed first. 

Given (AT,, Y^), . . . , Y^), create k additional pairs (7,,Ar,), . . . , 

{Y^,Xi^) and compute the product-moment correlation coefficient on the 2k 
pairs. This removes the order effect within the pairs. The result is 

2j](X,-M){Y,-M) 

1 

j:(X,-Mf + ^(Y,-Mf 

I I 


where M = {X+Y)/2. Hence, rj, the intraclass correlation coefficient, 
replaces the individual sample means and sample sums of squares by their 
averages. 

Next consider the one-way layout with n observations under each of k 
treatments. The preceding discussion is relevant for a one-way layout with 
n = 2. In general, there are n{n — l)/2 possible pairs under each treatment 
and if we add the reversed pair in each case we have n{n— 1) pairs from 
each treatment. The mtraclass correlation immediately generalizes to 


Kj 

/=!(=! 


(4.4.20) 


where M is the grand mean. 

The intraclass rank correlation in the one-way layout is computed by 
applying r, to the nk ranks of the combined data. In this case M = {kn + 1) 
/2 and ~ = nk{n^k^ — 1)/12, so that 

k 

24 S {R.. - {kn + \)/2){Rj, - {kn -h l)/2) 

(n - \)nk(n^k^- 1) 


12 

(n- \)nk{nV- 1) 


k 


s 


— n(nk + l)/2]^ 


nk{n^k} — 1 ) ) 

]• 


See Exercise 4.5.20 for the second equality. 


(4.4.21) 
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When the sample sires are equal, jV - 
have 

J-i 


nk, and from Theorem 4^4, we 

fl(ni)(n*+ 1) 

12 " 


where //* is the Kruskal-Wallis statistic Hence, the intraclass rank correla- 
tion coefficient is 


w//» |_ 

(n-l)(Bjt-l) n-l 


(44^2) 


and, also 


If 


(»* - 1) 
n 


[(n- !)«, + !] 


Thus when the mtraclass rank correlation is large, there is a high degree of 
agreement among the ranks under each treatment In this case, the Knis 
kal-WaI!is statistic will be large and reject the nut) hypothesis of no 
treatment effect We also note that //* can be used to test for significant 
mtraclass rank correlation provided that n is large so that Theorem 4,24 
can be applied Since H*>0 (4 4 22) shows that - l/(n - 1) < < 1 
This asymmetry requires that some care be exercised in the interpretation 
of R, There is a similar relationship between the one-way ^statistic and r, 
When the sample sues are unequal, the mtraclass correlations can stdl 
be computed, but they ate no longer linearly related to the one way layout 
lest statistics 


4,5. EXERCISES 

4,5 1. In the proof of Theorem 422 show that CovfK,, VJ)-* -e,Cj/l2 

4S 2 Venfy that II* reduces to the other equations m the statement of 
Theorem 4 24 

Venfy {4 2 12} for the variance of L Hint Note that 2>-iC/ “ 
(i+l)/2|>-(r(*:’-l)/12aiid 

2 S _S - (4 + I)/2][; - (4 + l)/2] 

- - _S['-(4 + 1)/2]* 


4J3 
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4.5.4. Let J = Y2.i<j^ij be given by (4.2.13). Under Hq, show that: 

Cov( W,, ,W,,) = Cov( , W,„ ) = 12 

Cov( ,W,,) = Cov( W,, ,K,)=- 12. 

Hint; Note that + fV^, is the Mann-Whitney-Wilcoxon sta- 
tistic computed on the uth sample versus the combined uth and tth 
samples and Var(lf;„ + WJ = Var + Var -I- 2Cov(lF„„, 

WJ- 

4.5.5. Suppose we have k samples, each of size n; hence N = nk. Show 
L, (4.2.11), can be expressed as 



i<j 


nk{k^ - 1 ) 


Hence L is equivalent to a weighted sum of pairwise Mann- 
Whitney-Wilcoxon statistics. 

4.5.6. Verify the moments for /? „ . . . , /? ^ in the two-way layout, under 
the null hypothesis, given in (4.3.1). 

4.5.7. a. Let the vector T' = (T,, . . . , T^.) have yth component Tj given 

by (4.3.3). Under the null hypothesis, show that T has a 
MVN{Q,B) limiting distribution, where B is defined by (4.3.4). 
Hint: Note that the rows of ranks in the two-way layout are 
i.i.d. vectors so that the multivariate central limit theorem, 
discussed below Theorem A8, can be directly applied, 
b. Show that if = (k- l)/k, y = 1, . . . , /c, then K*, (4.3.5), 
has an asymptotic chi-square distribution with k — 1 degrees 
of freedom. Hint: Apply Theorem 4.2.3. 

4.5.8. In the two-way layout, suppose are independent and have 

cdf Fix -II -a,- / = 1, . . . , n, y = 1, . . . , Let e; 

= (/I + a, -1- — , p + and 




218 TITE OKt- A>JD TflUHWAt lAttAHS A'ST» 'RASX COMHIATIDS 

a Similar to Theorem 426, argue that, when 

£,.r/-^<;V'2W’- !)//’«■'«{/!/ - ^ ). 

where 

b Let ■> (/I: - 1)/* and argue that the Fnedman statistic K*, 
(4 3 a), IS asymptotically noncentral chi square with A: — 1 de- 
grees of freedom and noncentrality parameter 

4,5 9. Show that Fnedman’s statistic reduces to the sign statistic when 
* = 2 

4,5 10. The two-way layout with m observations per cell Consider a 
model in which the independent random vanables X/j,. f~l, 
, n, y - 1. *. r ■> I. ,m. have cdf F(x - p - a, - fij), 

F eil(j We wish to test //^ - ■ /S* versus 

fit . not all equal Rank the data within the ith block, 
I ■ 1, . hence rank from 1 to mk Let Rj be the sum of 

ranks for the yih treatment. y ■ I. ,k Under H(,, show 

ER^ “ nm(mA: + !)/2 
Var R^ = nm\mk +!)(*- 1)/ 12 
Cov(R, .R, )=~nm\mk + I)/12 
Further, argue that 



IS asymptotically chi-square with k - I degrees of freedom See 
Exercise 5 5 8 for an application with data 
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4.5.11. Show that Page’s statistic Q, (4.3.10), has mean 0 and variance 
given by (4.3.11). Further, show that Q/(ya.T is asymptoti- 
cally 7l(0, 1). 

4.5.12. The statistic 7*, (4.3.12), can be used to test for an ordered 
alternative in a two-way layout with several observations per cell. 
Suppose there are k treatments, n blocks, and n,j observations in 
the (/, j) cell. Suppose the null hypothesis Hq: 6^ = • • ■ = 6/^ is 
true. Let n, = 2)= i«y show that: 

EJ* = 2 nf- 2 «y /4 
i=i[ j=\ J 


Vary* = '2 (2«, +3) - 2 «y(2«y + 3) 


If the n^j remain fixed and oo, use Theorem A6 to show that 
{J* - £'y*)/(Vary*)'/^ is asymptotically n(0, 1). 

4.5.13. The balanced incomplete block design: Durbin’s (1951) test. In 
this design there are n blocks (judges) and t treatments. There are 
k < t treatments ranked within each block, every treatment ap- 
pears in r < /I blocks, and every treatment appears with every 
other treatment an equal number of times. These designs are 
discussed by Cochran and Cox (1957). Let Rjh& the sum of ranks 
under the yth treatment,y = I, • • • . ^ Under the null hypothesis of 
no treatment effect, show 

ERj = r{k+ l)/2 

VsirRj = r{k^- 1)/12 
(Zov{R^,Rj)= ~r{k^-\)/[\l{t - 1)]. 

Argue that, under the null hypothesis. 
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has an asymptotic chi*square distnbution with k — 1 degrees of 
freedom 

43.14. Show that (4 4 4) holds Hint Expand 




43.15. Suppose X and Y are independent and show that Varr, = \/{n - 
1) for r, given by (4 4 4) We have already seen £f, = 0 Let 
S* « “ ('f + 0/2) in (4 44) Show the projection is 

given by 

£[ S' I 1-. - - (" + l)/2] [ '>(» - 1/2] 

where F^{y) is the marginal cdf of Y Then show (n — is 
asymptotically «(0. 1) Hint It is helpful to write /?y“l + 
~ ^’i) fof the rank of Y^ First compute E\t{Yj- Y,)- 
1/21 T* •>’) and then compute EJ/ly- (n + 1)/2| T* •• y] 

43.16 Show that Kendall's coeffiaeni of concordance IK, (4 4 !6). satis- 
fies 0 < IK< 1 Further, express IK, r^„. and K* as linear func- 
tions of each other 

43.17. Show that, under the null hypothesis that all row permutations are 
equally likely, we have, for the Schucany-Frawley statistic £*, 
(44 17), 

»m(> - l)k‘(k + 1)' 


Further, 


and 


mnk(fc 4 l)(fc 4 2) 

mini,* = 


m«*(*4l)(2it + I) 
6 


maxL' 
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4.5.18. Verify (4.4.19), which connects W* to the average of the rank 
correlation coefficients between the two groups. 

4.5.19. Show that, under the null hypothesis that all row permutations are 
equally likely, the Schucany-Frawley statistic L* is uncorrelated 
with Kf and K* , the Friedman statistics on the individual groups. 

4.5.20. Using an argument similar to the one used to establish (4.4.14), 
establish (4.4.21). 

4.5.21. Suppose we have two samples of size n. Let U, (3.2.2), be the sum 
of ranks of the first sample, in the combined sample. Let Rj be the 
intraclass rank correlation coefficient and show that 


Ri = 


12 


«(n — 1)(4«^ “ 1) L 


U- 


n(2n + 1) 


-,2 


n-l 


This provides the connection between the intraclass correlation 
coefficient and the Mann-Whitney-Wilcoxon statistic. 



CHAPTER 5 


The Linear Model 


5 1 INTRODUCTION AND SIMPLE REGRESSION 

We now turn to the linear model In this book we have concentrated on 
model! described by a set of independent observations Y, Y^ with 
respective cdfs - ^i) Fiy Major problems were 

formulated in terms of hypothesis tests and estimation of the location 
parameters 0, 0^ In the one sample location model the unknown 

locations are the same • 0,/ ^ 0 In the two sample location 

model 5 « * and If**, - “ and attention was 

focused on A • p, - pj The one way layout extends the two sample model 
to several samples tn the (wo way layout it is more convenient to use 
double subscnpts on the observations to indicate the two factors for 
example treatment and block Hence we have Yj i’" I n j 
“ 1 k with cdf F{y - 0^) In this chapter we will be interested in the 
particular case F{y — 0j) ™ F{} - 0^) m which the nfc observations are 
independent The data could be displayed m a two-way table in which < 
labels the n rows (blocks) and j labels (he k columns (treatments) Then the 
additive two-way model is specif ed by ^ p + o + I => I ” 

/ = I k Hypotheses are formulated in terms of the treatment effects 
/), /Sj^ with n and a, a, treated as nuisance parameters In the 
last chapter in the discussion of the two-way layout we avoided estimating 
the nuisance parameters by ranking within the blocks Equation (4 3 9) 
indicated a loss of eff ciency due to ranking m this way The methods 
developed in the present chapter recover the lost efficiency but they are 
only asymptotically nonparametnc 

In the general linear model we have a vector V of 

independent observations with cdfs F(y ~ 0) F(y ~ 0f/) F B fio 
The pnme denotes the transpose of the vector The lineanty is imposed on 
222 
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Bi by supposing that it is a linear function of p given, independent 
variables; x,-,, . . . , x,^. Hence 

0, = x,,/3,+ ••• (5.1.1) 

The coefficients . . . , are the unknown parameters. If we denote the 
relevant vectors by xj = (x,„ . . . , X/p)and then we can 

write 

0, = x;.^. (5.1.2) 

We could also write Y, = 0,- + e,- = + e,- where e' == (e„ . • . , e^v) is a 

vector of i.i.d. random variables with cdf F G Rq- 

Let X be the W X p matrix in which x'. is the /th row; then, in matrix 
notation, the linear model becomes 

Y = X/3 + e. (5.1.3) 

This general linear model covers regression, analysis of variance and 

analysis of covariance designs; see Section 5.4 for examples with data. 

If we let X be an X 1 vector of ones, then we have the one-sample 
location model. If we let 


’“[1 1 . 

where the second column contains n ones, then we have the two-sample 
location model with n and N — n observations in the respective samples. In 
this case 9, = + /Sj for / = 1, . . . , n, and 0, = /?] for / = /i -I- 1, . . . , 

and A = jSj. Next let 



where each smaller 1 denotes n ones. This provides the design for a 
two-way laj'out with 2 treatments, 2 blocks, and n observations per cell. In 
this example, = fi, + /i^ + fi^, 0,^ = (i^ + /i 2 - p,, 0^, = yS, - -E 
and 022 = P\- Pi - Pj, so that p^ corresponds to the grand mean, 2^^ 
represents the change in passing from the first to second block, and 2^83 
represents the change in levels of the treatment. The hypothesis of no 
treatment effect is Hq:P^ = 0. Draper and Smith (1981, Chapter 9) discuss 
multiple regression as applied to analysis of variance problems with special 
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attention to the two-way layout See Example 5 4 2 for a worked example 
with data 

Before turning to the general estimation and hypothesis testing problems 
in the linear model we will discuss the simple regression model In this case 
y, “ + x/Sj + e, i-l N The independent vanables a:, ,Xf/ 

are given numbers and we concentrate on the unknown sfope parameter 
The standard test of //q “ 0 versus ^2 9* 0 is based on ^ 

Yi As in the earlier chapters this suggests a rank test statistic 

N 

(514) 

i-i 


where are the ranks of T, We reject Hq ■= 0 for 

extreme values of U We need the moments of U and its asymptotic 
distribution under //g * 0 m order to approximate the cniical values 

Under //(, ^2 " 0 y# arc 1 1 d random vanables with cdf 

~ ^i) F ^ ilo Hence Exerasc 3 7 I provides the distribution theory 
for Rf, Further BU 0 and the vanance of V is given by 






;v(;v+ 1) 
n 




(5 15) 


where we have used the identity 0 •= [^(x, - x)]* “ ~ 

— x) In the next theorem we provide the projection of U 
and discuss the limiting distnbution of U 


Theorem 511 In the simple regression model suppose ^2 *■ 0 I™® 

Further suppose that 

r* <_| 


Then 


(N+l)>fN I- 


S (*i - W -(" + l)/2) ^Z~"(0 «Vl2) 
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Proof. The argument is similar to that of Theorem 3.2.4, and Exercise 
4.5.15 is a special case. First, note that the rank of Yj among 7,, . . . , 7^^ 
can be written as 

1= 1 

where ^(a:) = 1 if x > 0 and 0 otherwise. Then, since 


we have 


\ny) 


£[.(7,-7,)|7,=j] = 


1 - F(y) 

[\/2 


k=j 
k=i 
k^i or j. 


E[Rj\Y, =;;] = 1 + 2 £[^(7^ - 7,)1 7, =ji^] 

1=1 

^ |l+(iV-l)F(j) k=; 

[l + (N-2)/2 + {\-F{y)) k^j. 

Then, using a little algebra, we have 


£[t/*|7,=j] = L_^ 

(A^+ \)fN 


f 2 (^y-^)[V2-^(7)] 

V j^k 

+ (x,-3c)[(A^-l)£(j)-(7V-l)/2]| 


^'(fir^i^k-x)[F{y)-\]. 

Hence the projection V^, of U* is 
Jn ^ 

= Jihl ) 2 , - ^)[ ^( ^0 - 1 / 2 ]. 

Since £(7,^) has a uniform distribution on (0,1) with mean ^ and 
variance 1/12, we have 


Var Vp = 




\2{-N + 1)^ A = i 


2 (^A 
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From (5 I 5), Var so by the ProjccHon Theorem 2 5 2 £(t/* — 

Thus. U* and have the same limiting distribution Theorem 
AlO implies that (and hence U*) is asymptotically n(0,S^/]2) 

The result m the theorem can be restated as (//(Vari/)'^* is asymptoti- 
cally /i(0, 1) Hence //(, /Jj“0 is rejected m favor of at 

approximate level a, if [(/[ > Z.^ifVart/)'''* and I - *• a/2 

In order to discuss the estimation of introduce 

y(ft)-2(»,-*)«,(A). (5 16) 

where is the rank of Y,- fi,~ x,Pt, f “ I, .N Note that the 

rank RXP/i invariant under changes in fi, and can be computed Mih 
^,**0 If IS the true parameter value, then £t/(^2)»»0 This implies that 
the Hodges-Lehmann estimate of Pj is the value 0^ 

Adichie (1967) was the first to introduce rank estimates in the simple 
regression model Unlike the one* and iwo*sample problems. 0i does not 
have a simple representation and must be determined using numeneat 
methods 

To see why numerical methods are necessary, consider the graph of 
as a function of /)j Suppose x, < Jtj < < If F,) 

/{Xj - X,) j> I, then Y - x^p^m Y - x,/5, and these residuals will be 

assigned the average rank If on the other hand Pi<(Y^~ >',)/(^y " ■»i) 
then Y, - x,Pi < Y^- x^P^ Suppose that P, is also close to this slope, then 
the rank of Y^ - x^/5j is one greater than the rank of Y, - x,Pi As )?2 

moves across this slope so that iYj~ Y)/(x - x,)< Pi the ranks of 

~ 2 re interchanged This shows that is a 

decreasing step function which steps at the N{N - l)/2 pairwise slopes 
When Pj isjust barely below ( F, - } )/{x^ - x.) the ranks of Y, - and 
y, “ 2'’® ^,(p 2 ) and RiPj)* I, respectively They appear in as 

(x, -x)/?.(/Jj)and(x,-I){£,(ft).|. |] The change in t/( /8j) as crosses 
this slopejs (x,-x)£,(ft) + (x^-x){/?.(^ 2 )-H)-{(x,-x)I£,(ft) + 
^1 + (-r, “ a) £,(^ 2 )} = — X, Hence U{pi) steps down by the amount 

x^ — x, at the slope { Yj — F,)/(x^ ~ x,) This is in contrast to the one* and 
two-sample location models in which the steps at Walsh averages and 
pairwise differences, respectively, are consunt These vanable step sires 
result in a weighted median as the estimate of Pj. see Jacckel (1972) A 
naive computational approach would be to order the slopes and carry along 
the x^ - x, step sizes Then, beginning with max U(p/i = 
mulate the steps until UiPj) crosses zero or steps onto an interval of zeros 
see Example 5 I 1 Compare the discussions in Section 1 5. 2 3, and 3 2 
Section 5 4 has further discussion of compulation We will discuss the 
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statistical properties of /?2 *^he context of the general linear model in 
Section 5.2 where it is shown that is asymptotically normally 

distributed. 

At first thought, a natural estimate of based on would be the 

median of the pairwise slopes. This, however, is not the case in general as 
shown by the foregoing discussion. Exercise 5.5.1 shows that this estimate 
corresponds to the test based on Kendall’s t. See Sen (1968) for further 
discussion. 

Finally, we note that, because the ranks are invariant to a constant shift, 
the present approach does not provide an estimate or test of /?,, the 
intercept parameter. A natural estimate of is jS,, the median of the 
residuals 7, — x, /Sj, . . . , 7/v — Xf ^^2 assume symmetry of F, the 

median of the Walsh averages of the residuals. The joint limiting distribu- 
tion of is discussed at the end of Section 5.2. 

Example 5.1.1. Hubble’s law in astronomy states that the recession veloc- 
ity of a galaxy is directly proportional to the distance. Hence, once the 
constant ratio of velocity to distance is estimated, recession velocity can be 
predicted from a distance estimate. In Table 5.1 we provide the distance in 
millions of light years and velocity in hundreds of miles per second for 1 1 
galactic clusters from Clason (1958, p. 337). If x = distance and y 
= velocity, then we wish to fit the model Y = + e. The intercept is 

constrained to be zero. A plot of velocity (j) versus distance (x) shows the 
points to be close to a straight line. The problem is quite simple, and we use 
the data to illustrate the behavior of U{fi) = ~ x)RXP) where R,{P) 

IS the rank of y, - x,y3. The max = 2(^, ~ x)i = 10,320. There are 
H(10)/2 = 55 pairwise slopes beginning with the smallest slope 0.187 


Table 5.1. Distance and Velocity Data 


Cluster 

Distance (x) 

Velocity {y) 

y/x 

Virgo 

22 

7.5 

.341 

Pegasus 

68 

24 

.353 

Perseus 

108 

32 

.296 

Coma Berenices 

137 

47 

.343 

Ursa Major No. 1 

255 

93 

.365 

Leo 

315 

120 

.381 

Corona Borealis 

390 

134 

.344 

Gemini 

405 

144 

.356 

Bootes 

685 

245 

.358 

Ursa Major No. 2 

700 

260 

.371 

Hydra 

1100 

380 

.345 
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determined by Leo and Corona Borealis and a jump of 75 At 0 353, the 
23rd slope, the graph of U{P) steps from 332 to -90 Hence, 4“ 353. 
close to the average ratio of 350 The estimate of p which corresponds to 
Kendall's t (see Exercise 5,5 )) is the median of the 55 slopes This estimate 
IS 359, the 28th slope The least squares estimate is also ,353 

In Section 1 6 we introduced the influence curve of an estimator Robust 
estimates have bounded influence curves, so that single outlying observa 
tions cannot have inordinately large effects on the estimates Examples 
I 6 4 and 242 show that the median and the median of the Walsh averages 
both have bounded influence curves as opposed to the linear, unbounded 
influence curve of the sample mean The discussion around (I 6 I) also 
explains the connection between the influence curve and the asymptotic 
vanance of the estimate 

We now discuss the influence curves for the least squares and rank 
estimators of the slope parameter in the simple regression model The 
results cany over to the general linear model with minor notation changes, 
hence the discussion will not be repeated in the general case Cook and 
Weisberg (1982) desenbe in detail how influence curves and their finite* 
sample counterparu can be used in diagnostic data analysis in the linear 
model 

Recall the simple linear regression model is specified by y)»)S| + 
jr,/5j + e,, 1“ 1, Without loss of generality, we will suppose the 

independent variables have been centered so that x >■ 0 Define 

nw-w (5”) 

then the least squares estimate ‘S Ihe solution of ViPi) 

- 0 Let y) be the empincal bivanatc cdf, then (5 I 7) can be written 

(518) 

Since y) assigns mass N ' to the points (x,, y,), / = 1 , 

For a fixed value of x, F{y - /3, - ftx) represents the conditional 
distribution of Y given X •• x We generalize the problem slightly and allow 
X to be a realization of a random vanable X with marginal cdf A/(x) The 
joint distribution of X and Y is denoted by H{x, y) Then the parameter Pi 
can be defined as the solution pi = T{H) of the equation 

(5 19) 

and when H is replaced by (5 1 8) becomes ^(^i) “ 0 
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In order to compute the influence curve, we replace H by a contami- 
nated version: 


= (1 - t)H(^x,y) + tS,„yX^,y), (5.1.10) 

where y) puts mass one at the point (xq, y^), and differentiate with 

respect to ); see Definition 1.6.4. Inserting H, into (5.1.9) yields 

jjx[y-xT{H,)]d[H{x,y) + t{8,^, y) - H{x, y))] 0. 

(5.1.11) 


Differentiate with respect to t and set / = 0 to get 


dT{H,) 

dt 


f fx^dB(x,y) 
1 = 0 '^ 


+ //^[7-^7’(//)]^f(5^„,^„(x,>>)-/f(x,7)} =0, (5.1.12) 


This yields the influence curve evaluated at x = Xq and j = Jq as 


^(xo,yo) = -^T(ff,) = 

ai ,=o 


Myo- ^op2) 

fx^dM{x) 


(5.1.13) 


where jjx^dH{x,y) = jx^dM{x). The influence curve could have been 
denved more directly from the formula for ^ 2 - presented this 

development because it provides an outline for the denvation of the 
influence curve of the rank estimate of ^ 2 - The main point to be noted from 
(5.1.13) is that the least-squares estimate has unlimited influence in x and/ 
directions. Hence the influence is unbounded, and either an extreme x 
value or a large residual will have a large impact on the estimate. This 
generalizes the unbounded influence of the sample mean discussed in 
Example 1.6.4. 

The defimng equation for the R estimate is given by (5.1.6). We 
introduce a factor of N~^ in order to anticipate the defining equation for 
the parameter ^ 2 - Hence we consider 


J.V 

N N 



(5.1.14) 


Since = 0, 


the 1/2 does not alter the equation, but provides a convenient 
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cenlenng for the influence curve later Further since the rank is not 
effected by without loss of generality we let = 0 Now since 
= - X 02 ) where ) is the empirical cdf based on the residuals 

yi — X 02 I ^ I N we have 

/p[ - *ft) - y)-o (5 1 15) 

where //^ assigns the mass N ' to each point (x y ) « = 1 N 
Let/?j*» r(//) where //(x y) is the joint cdf of A' and K as descnbedin 
the preceding discussion of the least squares estimate Then the defining 
equation for 02 * T(H) is 

ffx[ F{y - XT(H)) - l/2]Jmx y)-0 (5 1 16) 


The contaminated version of II is given by (5 I 10) Note that y) 

- so the degenerate distribution function atx,, y^ factors into 

independent degenerate distribution functions The contaminated version 
of the conditional cdf of Y given X - x is 




jny-»0i) 


if X efc xo 
if X ■ Xo 


(5 1 17) 


fn the following calculations we suppose that jt/(x) does not have a jump 
at Xg This IS reasonable since if Af(x) were the design measure assigning 
N ' to each x i » 1 fV no mass would be assigned to the outlying 
Xg point 

We now must insert F and H into (5 I 16) diffcrenliale with respect to t 
and set r = 0 We first have 


(>-Ojjxlr{y-xTiH))- l/2]M(x ,) 

+ '//*[f (^ - )) - l/2]rf«,„.(» y)-0 

Using (5 1 17) we have 

(> l/2]rfH(i y) 

+ «.[fOo-»o2'(«))-l/2]-0 
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" I J ~ ^nH))dH{x,y) 

+ Xo[F(yo-Aor(//))-l/2]=0. 

After replacing T{H) by /Sj, the influence curve, evaluated at Xq, yo> 
becomes: 


o, ^ ^o[^^(jo-^o^2)- V2] 

. Fo) = 7717 ; 7—777 T • 

x^^)dH{x,y) 


(5.1.18) 


Recall that H(^x, y) is constructed from the marginal cdf of X, M(x), and 
the conditional cdf of Y given X = x, F(y — x/i;^. Hence 

J / x^fiy - XP 2 ) dH(x, y)=jj x^f{y ~ xpz) dF(y - dM (x) 


= / " ^^ 2 ) dy dM (X) 


— jx^dM(x)j f^[u)du. 

f 

The last line results when we make the change of variable 11 = y — 
since the inside integral will no longer depend on x. Thus, (5.1.18) becomes 


^^o-7o) 


Xo[F{yo~ x^pj)- 1 / 2 ] 

ff\tt)du fx^dM(x) 


(5.1.19) 


This influence curve is similar to that for the median of the Walsh 
averages in the one-sample problem; see (2.4.6). The influence is bounded 
relative to large or extreme residuals. However, the influence is unbounded 
with respect to design points. Extreme design points [sometimes called high 
leverage points; see Huber (1981) and Cook and Weisberg (1982)] can have 
a large impact on the R estimator. Hence we conclude that the R estimator 
IS robust relative to outlying residuals but not robust relative to outlying x 
values. The same thing happens in the general linear model so that care 
must be exercised when estimating parameters in the presence of points 
With high leverage. For a review of bounded influence regression see Huber 

/ 1 liOO\ 
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We see from this section that the linear model subsumes a large number 
of important models In the following sections we treat these models m a 
unified fashion through the linear model In the previous chapters we first 
introduced test statistics and then the estimates denved from them We now 
reverse this approach and first treat the problem of estimation Then we 
present three approaches to tesbng hypotheses Finally m Section 5 4 we 
illustrate the methods on data 


RANK ESTIMATES IN TIIE LINEAR MODEL 

We refine the notation of the linear model (5 1 3) to explicitly distinguish 
the intercept parameter from the regression parameters The examples m 
Section 5 1 included the intercept as pan of the design when needed We 
now let 


V-[I Xl(“p) + e (i21) 

where Y is an A' x 1 observation vector 1 is an // x 1 column vector of 
ones X IS an AT X /> matrix of known regression consunts a is the scalar 
intercept parameter ^ is a x I vector of unknown regression parameters 
and e IS an X I vector of 1 1 d errors with cdf A* € Ho Hence the median 
of the distribution of Y, is a x ^ where x is the ah row of X , 
Center the matrix X by subtracting column means to get X^^X- 
K*i where J is the mean of the ilh column of X Then (5^ 1) can 

be written in the form 

*■](“/!>' 

where a* •• a + %p and S « (I, Now the subspaces spanned by 

1 and the columns of X, are orthogonal This results in uncorrelated 
estimates of a* and j3 similar to least squares We concentrate mainly on 
estimation and testing of p Estimation of P separately from a or a* is 
effected by minimizing a measure of dispersion of residuals The nature of 
such a measure is desenbed in the following definition 

Definition 1 Let D( ) be a measure of vanability that satisfies the 
following two properties (1)D(Z+ Ja)«= D(Z)and (2) /)(-Z)» Z)(Z)for 
every N X 1 vector Z and scalar a Then D( ) is called an even location 
free measure of dispersion 
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If Z)(-) is even, location free, then D(Y — la — X/3) = D(Y — la* — 
X^jS) = Z>(Y - X^/3) = D(Y -X/3). Hence, when working with D(-), we 
can use either X or X^ without altering the results. We write D(Y — X/8) or 
D(Y - X<,/3) interchangeably. By minimizing D(Y — X/3), as a function of 
p, we have an estimate of /3, generated by D(-). For example, if D(Z) 
= (Z - 1Z)'(Z - IZ) where Z = then D(Y - X,/3) = (Y - X^P)' 

(Y — X^/3) + A^y^. TTie resulting estimate of /3 is the usual least-squares 
estimate. 

Our goal is to define an even, location-free measure of dispersion that 
produces a rank estimate of /3. This estimate will be considered the 
extension of the Hodges-Lehmann estimate from the two-sample location 
model to the regression model; see Example 5.2.1. 

Suppose a, < • • • < is a nonconstant sequence of scores such that 
= 0. They can be constructed using the score-generating func- 
tions discussed in Chapter 3. See Definition 3.4.2 for an example. Jaeckel 
(1972) defined the following measure of dispersion of the vector Z' = 
(Zj, . . . , Zyy); 

Z)(Z)=|;a(.)Z„ (5.2.3) 

i=) 

where < • • • < Further, if 7?,, . . . , 7?;^ are the ranks of 

Z|, . . . , Z;y then we can also write 


Z)(Z)=2«(^.)^, (5-2.4) 

»=i 

where we assign the average score to tied Z values. 


Definition 5.2.2. A rank estimate (R estimate) of /3 is the value /3 which 
minimizes 

D(Y-XP) = -Zci[R(y,-x:P)](Y,-x:P) (5.2.5) 

where xj is the ith row of X and R(Y, — x'P) is the rank of Y, — x'P among 

y,-x',P,...,Yf,-x'^p. 


TTiis even, location-free measure is a linear, rather than quadratic, 
function of the residuals with coefficients determined by the rank, or size, 
of the residuals. Hence it is hoped that the estimates generated by (5.2.5) 
will be more robust than least-squares estimates because the influence of 
outliers enters in a linear rather than quadratic fashion. The next theorem 
s ows that D(Y — XP) is a proper measure of dispersion. 
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Theorem 521 The function /)(Y — X/l) is a nonnegalive continuous 
and convex function of fi 

Proof Let Z = y, - X p / “ 1 Af Let t be such that fl(l) < 

< a(f — 1) < 0 < fl(<} < < <i(Ar) Then 

fl(Z)-So(,)[Z„-Z,„] 

since I “ “ 0 But each term a(0IZ(^ — Z( ,] > 0 

hence Z)(Z) » D(Y - Xp) > 0 

Now let;> “ (;j{I) ;)(fV)) be a permutation of 1 A'anddefine 


Theorem 368 of Hardy et a) (1952) states that 

V N 

max 2“(i)Z„,- 2«(')Z 1 

Hence 

fl(Y-Xfi)-max|:,.(,)(r,,-x„,/i) 


This shows that Z)(Y -XP) is the maximum of a finite set of linear 
functions ft then follows that D(\ - XP) is a continuous and convex 
function of p 

Jaeckcl (1972) further points out that if X is a full rank then Z)(Y - 
XP) attains its minimum and the set of p for which this occurs is bounded 
Hence the estimate in Definition 522 may be taken to be any value which 
minimizes D(\ — Xp) 

The domain (P space) of Z)(Y - Xp) is divided into a finite number of 
convex polygonal subsets on each of which 2)(Y - Xp) is a linear function 
of p The partial derivatives (gradient) of D(Y - XP) exist almost every 
where and are given by 


( 526 ) 
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for j=\, . . . ,p. We have used = 0 in the second equality to introduce 
the centered design. Hence, for each j = , p, the negative of the 

partial derivative of the dispersion is the regression rank statistic: 

5;(Y - x^) = I; {x,j - - x;^)]. (5.2.7) 

I— 1 

If we denote the gradient of £)(•) by VD(-) and let S'(Y — X^) = (SjCY — 
Xj3), . . . , Sp(y - Xj3)), then the analog of the linear normal equations of 
least squares is the set of nonlinear equations 


-V£)(Y-X/3) = S(Y-X/3) = 0. (5.2.8) 

We have approximately 0 because of the discrete nature of the regression 
rank statistics. This extends the discussion of simple regression at the end of 
Section 5.1. Hence minimizing Z)(Y — Xj8) is equivalent to solving (5.2.8). 
But this is the extension of the Hodges-Lehmann estimate to the linear 
model since £'S(Y — X/3) = 0 when /3 is the true parameter value. The 
reader is asked to verify that J5S(Y — X/3) = 0 in Exercise 5.5.2. 

As in previous chapters we are primarily concerned with the Wilcoxon 
scores. Let a{i) = <p{i/{N + 1)) where 

^(w)=/i2(M- 1/2). (5.2.9) 

This standardization is convenient since f^(u)du = 0 and f4>\u)du — 1. 

In the simple linear regression problem, (5.2.8) corresponds to (5.1.6). 
The following example illustrates the preceding ideas on the two-sample 
location model. 

Example 5.2.1. The two-sample location model can be expressed as fol- 
lows 


Y = 






+ e. 


Hence, if = ;V _ the fj^st n observations are from a distribution with 
median a + p and the last m observations are from a distribution with 
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median a The difference in population medians isA-a + ^- a- p In 
this example /» - 1 so we suppress the subscnpt on S',(Y - XP) The vector 
X IS the second column in the full dengn matrix Using the Wilcoxon score 
function, (5 2 9), we have 

- XP) - *,P) - in + l)/2](l'i - ^,P) 

S(V - XP ) - ^ i (J., - S)[ J!( n - x,P) ~{N+\)/l] 

-7^S[«(>',-P)-(X'+1)/2] 

The third expression follows easily by noting ~ + 0/ 

2J - 0 Now. if S(Y - XP) -■ 0. then 27-i^( " /3) "(^ + HA and 

IS the median of the pairwise differences across the two samples since we 
have the rank*sum form of the Mann-Whiiney-Wilcoxon statistic (Com- 
pare (3 2 4)) For future reference we note that for this example 


and X,X, - . 


where there are n m's and m — w’s 

We now develop the limiting distribution for “Po) where 4 

the R estimate found by minimizing (S 2 5) or solving (S 2 8) and Pg is the 
true parameter v^ue Recall from Section 2 7 that the Wilcoxon signed 
rank statistic n'''*r(6) is approximately linear in an appropnate sense, and 
this leads to a heunstic derivation of the asymptotic normality of — 
?o) We have a similar result for the regression rank statistic - XP) 
The strategy for determining the limiting distribution is as follows (1) from 
the linear approximation to S(Y-XP)- -VZ)(Y-XP) we construct a 
quadratic approximation to f>(Y - XP). (2) for the value p, which mini- 
mizes the quadratic approximation, it will be shown that N '''\ p - Po) has 
a limiting normal distribution, and then (3) we will show that ^ is close to p 
and -Pj) has the same bmiting distnbution as ^'''^(P - Po) 
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We begin by stating the linearity result for the vector of regression rank 
statistics. Let /3o denote the true parameter value and suppose we are using 
the Wilcoxon scores, (5.2.9). Then we have 

^ S(Y - X/3 ) = -^ S(Y - X/3o) - i/l2 f f{x) dx ^ X;X,W ^ - fio) 
^|N ^ 

+ 0,(1) (5.2.10) 

where o (1) tends to zero in probability uniformly for all vectors /3 such 
that |3 - /Soil < C for any C > 0. Here || • || denotes Euclidean 
length. This result is proved by Aubuchon (1982) under the mild assump- 
tions listed in (5.2.1 1) later. The proof is similar to, but more complex than, 
that of Theorem 2.7.2. Jureckova (1969, 1971) establishes (5.2.10) for 
general scores with I2^^^jf\x)dx replaced by jl(l>(u)4>j(u)du where 4>(u) is 
a score-generating function such that f4>(u)du = 0, j^\u)du = 1, and that 
satisfies Definition 3.4.2; and <Jy(u) is given in (3.5.14). Her proof requires 
more technical assumptions on the structure of the design matrix. 

We gather together at this point the assumptions that are made through- 
out this chapter. We will indicate from time to time more general results for 
arbitrary score functions and refer the reader to the primary sources for 
proofs. Henceforth, we assume: 


a. <#.(«) =Vl2(M- 1/2), a(/)=7T2[//(W+ 1)- 1/2]. 

b. [IX] has full column rank,^ -f I. 

c. “’[IXlTlXl converges to a positive definite matrix 

. . f5.2.in 

which implies that // 'X'X^-^ 2, positive definite. ^ ' 

(See the following lemma.) 

d. E G Hq and/(-) has finite Fisher information (Definition 2.9.1). 

Two implications of these assumptions are isolated in the following 
lemma for future reference. The proof is due to Aubuchon (1982). 

Lemma. Condition (c) implies that A“'X'Xj.->2, positive definite and 
condition (d) implies that J-„f\x)dx < oo. 

Proof. Note first that 


A-'[1X]'[1X] = 


x' 

A-'X'X 



A 


= A 
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where x “ (jE, , x^) and the hmit is posiUve definite Now define the 
following convergent matrix 


Then 



I 


Jo A^-‘X;XJ 


converges to AAA which » positive definite and the first part of the 
lemma is established 

For the second part note that if Y has cdf F 6 Uq ^ 

I - F {- >) e Sq with pdf J{-y) Now let X and - F be independent with 
pdfs/(jc) and /(->>) respectively Theorem A17 m the Appendix implies 
that the convolution density has the form 


/•(')-/" ->■)<> 




</"/"j/<» + 2)l/(0** 






5 . 2 . RANK ESTIMATES IN THE LINEAR MODEL 


239 


We have used the absolute continuity. The Cauchy-Schwarz inequality 
(Theorem A19), and Definition 2.9.1. Now f*(0) = j'^^f\x)dx < oo. 

The result in (5.2.10), valid under the assumptions in (5.2.1 1), provides a 
linear approximation to the gradient Vi)(Y — X/3). This suggests how to 
construct a quadratic approximation to Z>(Y — X/3). Suppose W“'X^X^ 
converges to S, a positive definite matrix and suppose is the true 
parameter value. Then define 

Q(Y-XP) = D(Y- Xfio) -(P- i8o)'S(Y - X/3o) 

+ ^ J f(x)dx N(P- j8o)'2( /3 - /3o). (5.2.12) 

This quadratic in |3 has the property that Q(Y — X/Sq) = D(Y — X/Sq), and 
the gradient of g(V — X/3) is the linear approximation on the right side of 

(5.2.10) . The next theorem, proved by Jaeckel (1972), shows that Q(Y — 
X/3) provides a useful approximation to D(Y — X/3). 

Theorem 5 . 2 . 2 . For any 5 > 0 and e > 0, under the assumptions in 

(5.2.11) , 


P 


sup le(Y-X/3)- Z)(Y-X/3)| > e ^0. 
vW|lP-Poll<B '* 


Proof. Let or,' be the yth row of 2. Then from (5.2.10), for all y = 1, . . . ,^, 


P I sup — ^ 


Sj{Y-XP)-Sj{Y-XPo) 


Hence 


+ mjf\x)dxN<r;{p-po) 


> 


B{p 


^0. 


sup 

[•fNWp-PalKB 


g|.e(V-Xp)-^0(Y-X(J) 




(5.2.13) 

Consider a point ^ such that W'/^n/J* _ < b. The point /3, 

(1 OPo ~ Po ^(.P* ~ P(^} for 0 < r < 1, is on the line segment 
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|[e(V-X(l,)-D(Y-Xft)] 

- 1, ( /); - Ay )[ 4 e(V - Xfl ) - I>(Y - XR )] 

With high probability, for large N, from (52 13) we can conclude that 

|| [ 0(Y - xfi ) - fl(Y - xft )]| < ^ S I /); - AJ 


IIS’ -All 

< ( 

Now define fi(0 - Q(Y - Xp.) - f)(Y - Xp,) and recall p,-/p* + 
(1 - /)Po Note h(t) IS differentiable almost everywhere and 


|).(i)-*(0)|<j[V(OI* 


Furthermore AfO) = e(Y - \p^ - D(V - xPo) = 0 and \h (0| < t for 
0< / < 1 Hence |A(1)1<< and wchavc | e(Y - Xp*) - 0(Y-XP*)| << 
for P* such that N '^^|| p* - PJI < B with high probability Since Q(y - 
Xp) — D(Y — Xp) IS continuous 

sup }^(Y-XP)-Z)(Y-XP)|<* 

With high probability, and this completes the proof 


Recall p minimizes £)(Y - XP) and solves S(Y - XP) = 0 Define 


P'“Po + 


'^S(Y-XA) 




(5 2 14 ) 
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where /Bg is the true, unknown parameter value. Then J8 minimizes the 
quadratic approximation Q(Y — Xp) and solves the right side of (5.2.10), 
the linear approximation. The reader is asked to find the limiting distribu- 
tion of — Po) in Exercise 5.5.3(b). We are now ready to show that /3 

behaves asymptotically like j3. Jaeckel (1972) proves the result for general 
scores. 

Theorem 5.23. Let /3 be any point that minimizes Z)(Y — XjS). Then, if /Sg 
is the true parameter value, under assumptions in (5.2.11), 

V^(/3-/3g) " Z~A/FA^(o, 5 rS-'l. 

\ I2(ff(x)dxf I 


Proof. Choose e > 0 and S > 0. The asymptotic normality of — 

Pg), Exercise 5.5.3, implies that there exists Cg such that 

P { II p - Pg|| > Cg/v^ ) < S/2 (5.2.15) 

for sufficiently large N. 

Define (see Exercise 5.5.12) 


7’= min{ e(Y - XP) : II /5 - = e//N } - Q{Y - Xj8 ). 

Note that T >0 since 0 is the unique minimum of QiY— X/3). Then, by 
Theorem 5.2.2 we have 

sup le(Y-X/3)- Z)(Y-XJ8)| > 7’/2| < 5/2. 
MlP-Eoll<(Co+t)/'^ 


(5.2.16) 

Now, for large N, with probability greater than I — 5, (5.2.15) and 
(5.2.16) imply 1| j3 -/3gl| < and 


D(Y-X0)< Q(Y ~X0) + T/2. 

For all p such that jj - P|1 = c/iV'/^ || p - Pg|| <\\P-0\\ + \\0- 
M <e/N'/2-hCg/Ar'/" = (Co-t-e)/iV'/^ Hence, by (5.2.16), \Q(Y- 
XP) - £)(Y - XP)1 < T/2 implies that D(Y - Xp) > Q(y - XP) - T/2. 
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D(Y-Xp) > Q(y-Xfi)-T/2 

> mm( Q(Y -Xfi)'\\p- p|| - </^ } - T/2 

- T+ Q{Y-Xp)-T/2 

= T/2+ Q{Y -Xp)> D(V -Xp) 

Hence D(Y - Xp) > D(Y - XP) tor all P such that H p - PH “ c/N 
The convexUy of D(Y — XP), Theorem 5 2 1, implies that D(V — Xp) 

> D(Y-XP) for all p such that ||p- P|| > Thus i>(Y-Xp’) 

> min2J(Y-Xp)=_/)(Y-Xp) which implies \\^-p\\<e/S^^^ But 
this says N '''*|| 4 “ Pil < < for large N, with high probability 

Hence N'^Hii-po) - N'^*(P-Po) converges to 0 in probability and 
they have the same limiting Astnbution. given in Exercise 553 This 
completes the proof 

laeckel (19221 proves that fijv-(P D(Y-XP>“tmn) w bounded 
Since Z)(Y - XP) is continuous and constant on is also closed 

These facts, along with the last theorem, show that (diameter of B^) 
converges to 0 m probability Hence, for moderate to large N, the set of 
points that minimizes D(Y - XP) should be quite small 

As a Simple application recall the two-sample location problem discussed 
in Example 5 2 1 It was pointed out that = mn/N where m + n = N 
If we suppose that m/N-*\ 0<A< I. then N 'X,Xf“»A(l —A) The 
estimate p is the median of the mn differences and the last theorem implies 
that - Pf^ IS asymptotically rt(0 l/(A(l - A)12(//^(>:)«f;t)*]) This 

result was first mentioned after (3 5 10) the efficacy of the Mann- 
Whitney-Wilcoxon test In the simple regression model discussed m the 
previous section, has a limiting normal distribution with 

mean 0 and vanance l/[5^l2(//^jc)flU:)*J where 

Wilks (1962, p 547) defioes the generalized vanance of a multivarate 
estimator as the determinant of its vanance covanance matrix We then 
define the asymptotic relative cffiacncy of two asymptotically multivanate 
normal estimators as the (l/p)th root of the reciprocal ratio of their 
asymptotic generalized vanances Sec Bicke! (1964. Section 4) In the 
Appendix, following Theorem A13, it B shown that if P* is the least- 
squares estimator then N''\p*-p^ is asymptotically A/W(0.o^£"') 
where is the vanance of F Hence the asymptotic efficiency of the rank 
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e(P, /3*) = 




^-1 


n(ff(x)dx] 


>//’ 


= f(x)dx'j\ (5.2.17) 


But this is the efficiency of the Wilcoxon methods relative to the least- 
squares methods and has been discussed extensively throughout the book. 
See Section 2.6. 

The estimator /3, except in the location model, is not simple to compute. 
Search methods such as steepest descent can be used to find the minimum 
of D(Y - X/3) since it is a convex surface. The Minitab computing system 
(see Ryan et al., 1981) contains a RANK REGRESSION command that 
provides /3 in its output. 

The linear approximation, (5.2.10), and its solution j3, (5.2.14), can be 
used to construct a one-step R estimator. Let denote an initial estimate 
of perhaps the least-squares estimate. Suppose that f is a consistent 
estimate of 


T = . 

t/I 2 ff\x) dx 

Then, replacing 2~' by A(X'X<,)~' in (5.2.14), define 

p, = ^o + 'f(x;x,)-'s(Y-x^‘o). 


(5.2.18) 


(5.2.19) 


The estimate /3, is a one-step estimator. By iterating this formula a k-step 
estimator can be quickly constructed. 

In the one-sample location model, in which we assume a symmetric 
underlying distribution, F e it was pointed out in (2.7.8) that the length 
of the Wilcoxon confidence interval, when properly standardized, provides 
a consistent estimate of t. If we compute this estimate on the residuals 
~ x'l p, . . . , — x'^/3 then McKean and Hettmansperger (1976) 

showed that this estimate is also consistent in the linear model. Hence, if we 
aje willing to suppose F G we can compute |3,, (5.2.19), or by iteration 
ft , the A: -step R estimate. 

Exercise 5.5.4 outlines the surprising result that N ‘/^( ^ — fig) has the 
same limiting distribution as N^/\fi — fig). Hence, the one-step estimate is 
asymptotically as efficient as the full R estimate found by minimizing 
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Z)(Y-Xp) sec McKean and Hctimansperger (1978) h should be empha 
sized however lhat Ihe limiting distribution developed in Theorem 523 
for N ^ - A)) does not require symmetry of the underlying distnbution 
On the other hand if t in the construction of (I, is based on the Wilcoxon 
conf dcnce interval computed from the residuals then symmetry must be 
assumed for the consistency of t 

The parameter t = l/(I2'^//*(jt)</jcl has appeared throughout the 
book For example T ' is the essential part of the Pitman efficacy of rank 
tests based on Wilcoxon scores and appears in the asymptotic linear 
approximation of the rank statistics Further is in the asymptotic 
vanance or covariance matru of Ihe Wilcoxon scores R estimates and is the 
stochastic limn of the standardized squared length of the Wilcoxon confi 
dence interval It appears in the quadratic approximation of D(V-Xfi) 
and will appear later as a standardizing parameter in rank tests based on 
residuals in (he linear model Hence it is important to have a consistent 
estimate of t without assuming symmetry of the underlying disinbuuon 

We now develop a different estimate of -r which dots not requite 
symmetry The method is based on an estimate of the density function We 
consider the n d case where Y Tw is a random sample from F 6 fio 
with pdf /(*) The estimate of /(x) calM a kernel or window estimate was 
proposed by Rosenblatt (1956) and studied further by Parren (1962) 
Wegman (1972) and Bean and Tsokos (1980) provide extensive surveys of 
the area of density estimation 

Definition 523 Suppose F' C (Iq with pdf /(x) Suppose wfx) is a square 
integrable density symmetric about 0 and a sequence of constants such 
that 


Then 


h^~*0 and Nhf,-*a> 




IS called a window estimator of /(y) Hie function h'(x) is called the 
window or kernel and h^, is called the window width or bandwidth 


We restnet attention to a uniform window 


^(x)®! for -I/2<x<l/2 
0 otherwise 


(52 20) 
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We do this because a uniform window is easy to work with, and Bean and 
Tsokos (1980, p. 277) indicate that the choice of window is not nearly so 
crucial as the choice of window width. 

Using the indicator function I{E) = 1 if the event E occurs and 0 
otherwise, with (5.2.20), we have 


fN{y) = 


1 

Nh^ 



< 7 + 




) 


flf^ 


where is the empirical cdf. Hence, /;^,(/) in (5.2.21) is an empirical 
derivative of the empirical cdf. In Exercise 5.5.5 you are asked to show that 
I Ay) converges in probability to /(y)- 
Based on we can now construct an estimator of t, (5.2.18). We 
consider the parameter 

y=jf{y)dy=jfiy)dF{y). (5.2.22) 


The estimator is given by 


V= J/;v(y)^^yv(y) 


1 

NX 


N N 


2 S>v 

,=Iy=I 



1 

NX 


N N 


2 2 /. 

,=ly=l 


(5.2.23) 


where 7,^ = 1 if —}j^j2 < Y, — Yj < hf^/2 and 0 otherwise, from (5.2.20). 

The estimate y (5.2.23), was first proposed by Schuster (1974) and 
studied extensively, for the case of a random sample, by Schweder (1975). 
Schweder (1975) discusses the asymptotic normality of y based on general 
windows and general scores. He further discusses several practical consider- 
ations in the implementation of the estimator. Bhattacharyya and Roussas 
(1969) study jfAx)dx, and Cheng and Serfling (1981) extend this estimate 
to the general scores case. It is possible to show when using the Wilcoxon 
scores that the Bhattacharyya-Roussas estimate is a special case of the 
estimate y; see Exercise 5.5.6. Hence we restrict our attention to y, (5.2.23). 


Theorem 5.2.4. Suppose 7,, . . . , are i.i.d. F G Aq with pdf /(•). 
Suppose jf\x)dx < 00 . Then y, (5.2.23), converges in probability to y = 
if {x)dx. 



246 

Proof We first compute the expectation 


TtlE LINEAR MODEL 


£r - 2 2 [JV + 'V(Ar - ijfd y, - i-J < h,/2)] 

N^hfi i-i jmi ^ "n 

Lei Z = y, — Kj, then, from Theorem AI7, 
f (0 - />(Z < i) -/“ f(' 

With pdf 

Hence 

and 

Bi-rm-jpwiy 

Now, 


Varf--Lv42 2',|--TVvar/„ 

N‘h} ’) NX 

+ ^ £ 1 22 2 2 (','., - £',«.,)) 


Recall Nh^ «, then using the same argument as m Theorem 2 5 1 and 
Theorem 2 7 2, we have 


Varf-;^(£/u/„- £/„£/„) 


Now, 


- NIaEl„\ < £/„ + (£;„)= 
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Nh 


■N 


-hEl 


12 


[1 + £ 7 , 2 ] ^0 


since Nhf^-^ CO, \/(Nhi„)-^0, and 1 + .E /,2 is bounded 

above by 2. 

Thus we have Ey^y and Var y 0, so y is a consistent estimate of y by 
Theorem A5. 


We now discuss some practical issues that are necessary for the imple- 
mentation of y. Note that the i = j terms in (5.2.23) are nonrandom and y 
ean be written 




We will consider the following modification of y, 





1 

N(N-l)h^ 


'^J \ 



(5.2.24) 


where E is a constant. Obviously, Theorem 5.2.4 holds also for y*, and y* 
is a consistent estimator of y. Now if f*iz) = jf(y + z)f{y)dy is the pdf of 
T] - Y 2 , then 




= )^ + 1 j''(“)[r(0) + Kur(s>) + hwr{fi)/2] du+ hio(i), 

(5.2.25) 

provided /•'"(j) is continuous at 0. Now since w(m) is symmetric about 0 
andy=/*(0), we have 

®‘as(v*) = (5.2.26) 

Ailer differentiating y.(z) twice, plus an integration by parts, we have 

y"(0)-- f" in^)T‘ix. 

•' — 00 


(5.2.27) 
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The estimator y* will be used to standardize test statistics in the next 
section and we are concerned more about the bias of y* than efficiency 
Hence wc wish to let tend to zero as fast as possible Theory developed 
by Aubuchon {1982 Section 43) for the use of y* based on dependent 
residuals m the linear model with random Ajy shows that h^/ can go to zero 
no faster than S ' ^ Hence we take 

then the bias ignonng the o(A^) term becomes 

mt + M 

The bias is thus of the order I /N and selling (52 28) equal to zero and 
solving for K yields 



Lel/(>-)-S '/,(J 'y) then from (5 2 27) 

and 

Thus the estimate y* (5224) with hf, = where K is any fixed 

constant, is a consistent estimate of y TTie bias of y* tends to zero at the 
rate l/N and we have shown previously that if we know the form of /{ ) 
we could choose K (5 2 29) to remove the l/N bias terms Of course we 
do not know /( ) We have separated out two components in (5 2 29) a 
scale parameter 5 and the standardized form of /( ) namely /i( ) 
Schweder (1975) suggests using the daU to estimate both 5 and /(/|(x)]Vx 
A simpler approach is to take 5 to be a robust scale parameter such as the 
mean absolute deviation from the mean or the interquartile range and 
estimate it from the data while assummg a distribution such as normal for 
computing /[/i(x)J^</x The bias is stiU of the order l/N and we think of 
the calculations leading to (5 2 29) as providing a practical guide in the 
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problem of selecting K and hence selecting the crucial window width. See 
Aubuchon and Hettmansperger (1982b) for additional discussion of the 
properties of y*. 

As a simple example we illustrate the calculations for 8 = F“'(3/4) — 
F"'(l/4), the interquartile range, and /(•), the normal density. The 
standardized density /,(■) is that normal density with 5=1. Since 5 
= o[$“ '(3/4) — $“'(1/4)] = a( 1.348) where a is the standard deviation, 
/,(•) is the normal density with mean 0 and variance .5503. A simple 
calculation then shows /[/((x)f t/x = .3455. We have been using the rectan- 
gular window, (5.2.20), for which ^u\^{u)du— 1/12; however, the equa- 
tions are valid for any smooth, symmetric window. Hence, from (5.2.29) we 
have = 4.115 and 5 is estimated by the interquartile range of the data. 

Using a rectangular window for vi’( • ) and the sample interquartile range 
to estimate 5, the estimate of y, (5.2.22), is given by 


Y* = 


— ^ r . ■ - 1 

4.11A5 V77(A- 1)4.115 Tcy \a.\\8/4n j 




\ 


, Va 

4.1 1A5 4.115 


L 




where F^{-) is the empirical cdf computed from the N{N — l)/2 differ- 
ences Y,- < N. 

Recall that our purpose for introducing the estimator y (or y*) was to 
provide an estimate of y = jf\x)dx or t = l/[12''^^//^(x)fifx] without 
assuming that /( • ) is symmetric. Then, for example, the one-step estimator 
Pi, (5.2.19), can be computed without assuming symmetry. In the next 
section, tests that depend on an estimate of y (or t) can be constructed 
without the symmetry assumption. Thus far the discussion has assumed a 
random sample T,, . . . , T^. We now turn again to the linear model, 
(5.2.1) or (5.2.2), with observations T], . . . , Y^. 

Suppose Fefto and the assumptions of (5.2.11) hold. Suppose 
satisfies Definition 5.2.3 with fu^v(u)du < oo, and 

is bounded in probability. Then Aubuchon (1982) shows that 
Theorem 5.2.4 still holds when y is computed from the dependent residuals 
^1 ~ x] p, . . . , Thus, y or y* provides a consistent estimate of 

^ " If (x)dx without the symmetry assumption. For the general scores 
case we would need to estimate f<P'(F(x))f^(x) dx. An obvious candidate is 
dFfj(x). The consistency of this estimator is discussed by 
c weder (1975) for the case of a random sample. His results have not yet 
cen extended to the estimator based on residuals in a linear model. 
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Companng (52 30) to (5 2^1). we see that 7 * is an estimate ol tbe 
density (or derivative) of F* (the cdf of the difference) at 0 Sievers and 
McKean (1983) suggest a slightly different estimate They consider y** 
•» where //yy( ) isjhe empirical cdf of the absolute 

values of differences of residuals and t is the a percentile of Hn{ ). 
0< a < 1 This estimate is also consistent without the symmetry of /( ) 
Note that if are iid f eO,, then — X^ has cdf C £0, In a 
designed expenment with several observations per cell. Draper (1983) 
develops an estimate of ^f\x)dx without the symmetry of /( ) by combin- 
ing the lengths of two-sample Hodgcs-Lchmann confidence intervals 
fo.med across the pairs of cells See (325) for a description of the 
confidence interval There has been no comparative study of the various 
approaches to the estimation of ^f\x)dx 

To fix the ideas involved in using y*. we outline the calculations needed 
to produce an approximate 95% confidence interval for in Example 5 \ 1 
From that example, /} - 353, and the residuals r, - T^-0 353jr,. t"!. 

,11. are -8 3 -6 124. -3 670, - 1 364. -0266. -0004, 

1 035, 2 985. 3 195, 8 805, and 12 9 The quarliles are - 3 67 and 3 195, the 
interquartile range is ^ 6 865 To compute y*. (5 2 30), we must form the 
55 differences r, - r^. I < i<y < f I Then, from (5 2 20). 

»( — ■ ' '' I - I i( |r,-r,|< 4254 

and y* - 003 + 034 - 037 Than T- .. I/(I2'^V*) - 7 734 TTiis IS com- 
parable to the least squares s » 648 based on the residual sum of squares 
Now S"', in Theorem 5 2 3, is replaced by {N '2(-Xj “ ^)^)"' = 00001 
Then is approximately normally distributed with variance 

estimated by »= (5981)( OOOOl) •= 0006 Finally, the approximate 

95% confidence interval is 353 ± 2(0006)'/*/! 1*^* or (338, 368) 

We complete this section with a discussion of the intercept parameter a 
This parameter is estimated by applying one-sample methods to the residu- 
als Fi— X|^, — where 0 is the rank estimator developed 

previously If we assume symmetry {F £ ft,) and use Wilcoxon scores, then 
the estimate of a would be the median of the Walsh averages of the 
residuals, see (2 3 2) Suppose assumptions (5 211) hold Suppose oq 
and Pa denote the true parameter values in the linear model. (5 2 1), and 
*[1 X][l X]-»A, full rank Then Theorem 523 can be extended 
immediately to show 
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This result also holds for one- (or k) step rank estimators, (5.2.19). See 
McKean and Hettmansperger (1978) for a further discussion. 

When we do not wish to assume symmetry (F E S2o)> we have to decide 
what location parameter the intercept represents. We take a to be the 
median of the distribution of Y, — x'j8. Hence, a is the median of Fi- 
x', j3, . . . , - x'jv/S, while 0 is constructed using the Wilcoxon scores. 

Under the assumptions in (5.2.11), Aubuchon (1982) shows that 

4n ■^Z~MFiV(0,V) (5.2.32) 

where 

V = W\^)] ' "V - T V'S “ ' 

and r = l/[l2'^^jf\x)dx], (5.2.18). The asymptotic variance-covariance 
matrix is developed from projections using sign scores for the intercept and 
Wilcoxon scores for the regression parameters. 
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We now turn to the problem of hypothesis testing in the linear model. The 
basic model is given by (5.2.1) or (5.2.2). Again, as m the previous section, 
we restrict our attention to the p X 1 vector of regression parameters /3. 
Partition the vector as 

P.3..) 

where and ^2 ^re (p — q) X 1 and q X 1 vectors, respectively. Then the 
linear model can be written as 

Y = la* + X„p, + X2,l32-t-e (5.3.2) 

where X,^ and X 2 C are N X {p — q) and N X q matrices with [X,^,X 2 c] 
Our goal IS to construct tests of the hypotheses: 


^0 • ^2 = Pi unspecified versus 
• Pi ^ Pi unspecified. 


(5.3.3) 


An alternative formulation is based on specifying a collection of q 
■nearly independent, estimable functions Hj3 where H is given qX p 
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matnx For example if I! = (O I) where 1 is iht qXij identity then 

The vector specifics the parameten under lest whereas p, denotes the 
nuisance parameters For example in Ihe examples following (5 I 3) con 
sidef the two-way layout with two treatments and two blocks The lest of 
//q Pj m 0 versus Pj ^ 0 with unspecified is an example with 
P “(/Jj Pi) and in our present notation ‘Pj = Pj and “Pi’’“P 2 This 
example could easily be expanded to a test for interaction in which the 
mam effects determine the nuisance parameters This is illustrated in the 
next section with data 

The framework is quite general The methods to be developed cover tests 
for main effects and interactions in multiway layouts for the significance of 
covanates in analysis of covanance models and for coefficients in regres- 
sion models 

5J1 Tests Based on iD(Y - XP) 

In the last section we used Z)(Y - XP) <5 2 5) to define an estimate of P 
Hence D(Y - XP) is used as a criterion for fitting a linear model to data 
In fact Diy - XP) represents the minimum distance as measured by 
Z)(Y - Xp) from the dau vector to the subspace determined by the linear 
model Let Pg denote the p x I vector specified by //g p 2 ** 0 Thus we 
have Po *■ (Pi 0 ) and we can rewnte (S i 3) as 

"o P-ft) versus //^ (534) 

Further let denote the R estimate of Pg that is the value that minimizes 
£>(Y — XPg) •• Z)(Y — \, Pi) The estimate 0g is referred to as the reduced 
model estimate The full model estimate is denoted by 0 and U minimizes 
D(Y-XP) 

To test the hypothesis Hg p = pg versus //^ P^ Po we will compare 
D(y — \0g) to I>(Y — Xp) and reject Hg P = Pg for large values of 
/)(Y - XPp) - i)(Y - Xp) This IS the same strategy as that used to de- 
velop F tests based on the reduction in sum of squares due to fitting the 
reduced and full models To make the test operational we at least need the 
limiting distnbution under the null hypothesis It should be noted that the 
test 15 not distribution free for finite sample sizes The next theorem shows 
that the test is however asymptotically distnbution free under the null 
hypothesis The result m the next theorem shows that 

D{Y-\fig)-D(Y-X0) 

(v/2) 
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where r = is asymptotically chi-square with q degrees of 

freedom. We propose to use 


r*=\/{^y*) (5.3.5) 

where 


y* 


NK N{N-\)K ‘^j 


SS-(V)’ 
\ / 


r, = y, - x/j8, / = 1, . . . , 4 is the full-model R estimate of j8, K is an 

estimate of K and = K/N See (5.2.30) with = 4.1 1 S where 5 is the 
interquartile range of the residuals. The hypothesis T/q : J8 = /Sq (or Hq : P 2 
= 0) is rejected at approximate level a if 


D* = 


D(Y-Xpo)- D(Y-Xp) 
(r*/2) 




(5.3.6) 


where x^(q) is the chi-square critical value with q degrees of freedom. This 
test is illustrated on data in the next section. 


Theorem 5J.1. Given the linear model, (5.3.2), suppose the null hypothe- 
sis Hgi = 0 holds. Under the assumptions (5.2.11), 

^ D(Y-XPo)-D(Y-X0) 

(r/2) 

has an asymptotic x^C^) distribution, where j8o> P ^re the reduced- and 
full-model R estimates and t = l/(12'/^//^(x)cfx). 

Proof. The argument proceeds by approximating D(-) with Q(-), (5.2.12), 
and by approximating /3> the R estimate by /3> (5.2.14), the value that 
minimizes^ Q(Y — X/3). We then show that the asymptotic behavior of 
^(Y - XPo) - D(Y — X/3) IS determined by that of Q(Y — X/Sq) — Q(Y — 
^P). Finally, the argument is completed by showing that Q(Y — Xj8o) ~ 
Q(y ~Xp), when properly standardized, has an asymptotic chi-square 
distribution. Throughout this discussion we simplify the notation by writing 
^(P) rather than D(Y ~ X/3) and similarly for 0(0). 

We begin by wnting 

0(Po) ~D(0) = D{0o) - Q{0g) + Q(0g) - Q{0g) + Q(0g) - (2(4) 
+ Q{0)-Q(0)+Q(0)-D(0). (5.3.7) 
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Under the null hypolhcsis, by Theorem 5.2 3, bolh N " Po) “d 
are asymptotically normally distributed and hence are 
bounded m probability Theorem 522 can now be applied to assert that 
the first and fifth difference on the right side of (5 3 7) tend to zero m 
probability, that is, they are o,(l) 

Next, substituting p and ^ into Q(.fi) (5 2 12), we have 

. . - m(!!\x)dx)N 

e(f!)-Q(/i)-(^-/i )•«(«.)+ — '—j — ~ 

X [( p - p - ft) - ( ^ - p,)’S( j5 -M] 

!f the expression in square brackets in the second term is multiplied out and 
rearranged, it can be written as 

Hence 

e(P~)-g(^)-g(^-g)|^s(p,)- 

xS^(^-ft + ^-p,)j {S3J) 

From the proof of Theorem 5 2 3, Af - P) ■= 0.(0 and 
and pg) are asymptotically normally distributed Since 

A' "'^^SCPo) IS also asymptotically normally distnbuted (Exercise 5 5 3), the 
second factor in (S 3 8) is bounded in probability Hence we conclude that 
the fourth term in (5 3 7) is o^(l) A similar argument shows that the second 
term in (5 3 7) is also o^(0 
We can now write (5 3 7) as 

A) -o(?)-e(ft)-e(fl‘) + »,(!) 

Recall A^“'(XfXf)-»2 a positive definite matnx, see assumptions in 
(5 2 11) Partition S to correspond to the partition of P 



where S,, is (p - q) X(p - X q ind 2,2 = 22, 
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Substitute /3, given in (5.2.14), into g(i8) and simplify to get 


Q(^) = ^(Po)- 


1 

lN4n^f\x)dx 


S'(l8o)S-'S(/3o). 


Likewise, using the partition, (5.3.9), 

1 


Q(Po) = -D(Po)- 

Hence 

Qm-Q(0]= 


r- , S'(/3o)l"-» ''|S(^o)- 

2N/I2ff(x)dx VO O; 


Sn' 0 


2N^^^2 jf^(x)dx 


S'(Po) 


2"' -(^11 ® 
0 0 


S(Po), 


and 


Z) = 




^[N 


ir/2) 


S'(^o) 



»)■ 

^ 0 



-^S(i8o) + o^(l). (5.3.10) 


Using a result on the inverse of a partitioned matrix (see Arnold, 1981, p. 
450) we have 


~ (T'' o) " ( " s„sn'2,2)“'(-22.snM). 

(5.3.11) 

where (AB,C) denotes the matrix partitioned into the product AB and C. 
Then the right side of (5.3.10) becomes 


X;^[(-S2,Sr,‘,l)S(W]. 


(5.3.12) 
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From Exercise 5 5 3, 




(511J) 

so that 


(53 14) 

where 





(5 315) 


Thus, from (5 310), D converges m distnbution to VA~‘V This is a 
quadratic form m a multivanate normal vector m which the matrix is the 
inverse ol the covanance matnx The disinhuljon o! such a quadratic form 
IS chi'square with degrees of freedom equal to the rank of A. see Arnold 
(1981, p 49) But the rank of A is ^ since it is nonsingular and the proof is 
complete 

This theorem is proved by McKean and Hetimansperger (1976) for 
general score functions In that case t“* js given by where 

/^i/)</u = 0, j^\u)du‘‘ I and ^j{u) is given in (3 5 14) A consistent 
estimate of r is based on the length of the one sample confidence interval 
constructed from the full-model residuals This approach to the estimation 
of T requires the symmetry of F, the error distribution The consistency of a 
window type estimator for general t has not yet been established ngorously 
Recall the discussion following (S 2 30) 

The key result m establishing the theorem is (5 3 10) which relates /> to a 
quadratic form in the vector of rank statistics S(Pq), where Po“(Pi’®^> 
the value of p specified under the null hypothesis Since m is 
generally an unknown vector of nuisance parameters, the quadratic form in 
S(pg) IS not a test statistic We now provide an example in which po 
completely specified (there arc no nuisance parameters) and the test based 
on S(^o) reduces to a known test 

Exampfe 5J.f. Consider the one-way layout We have i *■ I, 

, k, independently distnbuted with X^^^F^x — 0,), i = l. 



5 . 3 . RANK TESTS IN THE LINEAR MODEL 


257 


. . . ,n,; . . . ;X,k-~'F{x - 0^), / = 1, . . . , F E Aq. and we wish to test 
//o'. fl, = • • • -0^ versus : 9^, . . . , 9^ not all equal. In order to formu- 
late this problem in terms of a linear model with nonsmgular design matrix 
we let Y' = (A', I, . . . , X„^i, . . . , X^|^, . . . , and 

[ -I 0 » 

I 1". 0 0 

Y= I 0 0 • •• 0 

[ A 0 K 

where the subscnpts on the entries indicate the length of the vectors. This 
linear model is related to our original one-way layout as follows: 

0| = a 

92 = a + ^2 ^2 = ^2 - ^1 

= a + Pk = ^k- 

In terms of the linear model we will test Hq’. Pi ~ ‘ ‘ ~ Pk~^ versus 
^a' , Pk not all zero, or //q : /3 = 0 versus : j8 =7^ 0 where /3' 

= ip2,--.,Pk)- 

The X matrix has column means, x' = (rii/N, . . . , n^/ N) where N 
Recall, = X — lx' and hence, if n,/N^X,, 0<\, <1, i 
= 1 , . . . , A:, 

_ 2 ^ 

n[ n) 

^ 2(1 ~ Aj) — X 2 A 3 • • • — ^2^ 

: : = s. 

~^2\ ■■■ \(i~\) 
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The inverse of ihis matrix is given by Graybiil (1969), pp 170-171) as 



o 

e 


1 1 

S"' - 

0 •• 

0 K' 

+ A,-' 

.1 -1 


where X, » 1 - The vector S( — S(0) has _/th component 



j_~ 2, , k, where R,j is the rank of m the combined data array and 

Rj - ihat the centered sum to zero and 

x,j - 1 as I ranges over the indices in the ylh sample The quadratic form on 
the right side of (5 3 10) can be written as 

ls(0,s-'s(0)-±[iis/^x(is,)’] 


■s 2 

" /-I ^ 

since S| = *'0"' replace X^ by and 


(A'+l)*j?iM * 2 } N+l 


where H* is the Kruskal-Wallis statistic given in Theorem 4 2 4 Hence, m 
the one-way layout, the quadratic form on the nght side of (5 3 10) is 
essentially the Kruskal-WalUs stabstic and we would not use D*. (5 3 6). 
which requires the estimation of t However, this example shows that D* is 
asymptotically equivalent to H*. 
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When there are nuisance parameters present in the model and jSg is not 
fully specified under Hq, we estimate t and use Z)*. In the next subsection 
we show that if estimates of the nuisance parameters are substituted into 
S(j3o), then a test can be based on the quadratic form in (5.3.10) with S(j8o) 
replaced by S( /Bg). Practical aspects of implementation are discussed with 
the examples in Section 5.4. 


53.2. Tests Based on S(Y - XP) 


From the proof of Theorem 5.3.1, under /fg.* j8 = jSg where jSg = (/3]',0'), we 
have 


isxv-xft) 



-Ls(Y-X/5„) 

vw 


= ^ [(-22,SriM)S(Y - Xl8g)]'[S,2 - 22,Sr,'S,2] 


X ^ [(-S2,Sn',I)S(Y - X^g)], (5.3.16) 

which has a limiting chi-square distnbution with q degrees of freedom; see 
(5.3.10) and (5.3.12). Since j8g contains the unspecified /3, we cannot base a 
test on this quadratic form unless we estimate /3,. 

The reduced model in which /Bj = 0 is given by 


Y = la* X,^i8, + e. (5.3.17) 

The reduced-model R estimate, defined by (5.2.8), is denoted by p,g where 
the {p~ q)x\ vector of regression rank statistics is denoted by Si(Y — 
^ic^io)- Further, note that S,(Y — X,^/B,g) = 0. Since 

A = (5.3.18) 


we can also write S,(Y - X,j8g). We will also let S 2 (Y - X,^g) be the 
remaining q components of S(Y - X/Bg). Hence, 



S,(Y-X,Pg)\ 

S,{Y-Xjo)l 


S,(Y-Xjo)l' 


(5.3.19 ) 


Now, if we insert jBg into 


(5.3.16), we have a test statistic. For purposes of 
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constructing the test stalistics, we will replace 2 by A^“'(X'X,), 2,, by 
A^~'(X|,X,,), and 2,j by Af''(X],X 2 ,) Then we have, using (53 19) and 
(53 16). 

S"-S(Y-X,&)[(x;x,>-' ' ®]ls(Y-X,^i,) 


- s;(Y - x,A)[(Xi,x„) - x^x,xX',^„)-'x;,x„]“'s,(Y - x,ft) 

(5 3 20) 

It remains to show that the substitution of the reduced model rank 
estimate for does not alter the limiting disinbution 

Theorem 5JA Given the linear model. (5 3 2). suppose the null hypothe- 
sis Ha ~ 0 holds Under the assumptions in (5 2 1 1). 5*. (5 3 20). has an 
asymptotic chi-square disinbution with ^ degrees of freedom 

Proof We will work with the second form of 5* pven in (5 3 20) Recall t 
IS given by (5 2 18), and (S 2 10) yields 

^8(0)- 'ix^,^O-ft) + 0,(l). 


for all p such that H ' p - ^|| < C for C > 0 a constant Wnting this in 

partitioned form, we have 


J_(SiO)\ 


S:)^( 


Pi - P,o\ 

Pi-Pioj 


‘V(0. 


where Pa = (Pi'o.O ) Since - flo) is bounded m probability under 

Hq by Theorem 5 2 3, we may replace P by 0q and we have 


S,( 0a) = ^ S,(Pa) - T- ±(Xi,X, jyS' (^,0 - P,o) + o^{l) 


(5 3 21) 
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Further, applying (5.2.10) in the reduced model, 

= ^S,(P,o) - T-' ^,o) + 0,(1), 

and we have 

^{Pw-fiw) = ^S,(^,o) + 0,(1). 

Substituting this into (5.3.21) yields 

i ^ s^( w - i (X;,x„)( i X-,X,.) ' ' ^ S,( ft) + 

(5.3.22) 

From Exercise 5.5.3 we have 


Viv \S2(Po)j 


^Z~MVN{0,'E,) 


where N ’X^X^^S, positive definite. In (5.3.22), replace N 'X^cXj^. by 
^ 2 , and A^-'X',^X,, by 2,,. Then 




is,(ft) 


+ »,(>) 


-(- 2 :„Sr,',l);^S(ft) + 0,(l). 

■^is is limiting MFA^(0,222 - 22jSr,'S,2) by (5.3.14) and (5.3.15). Now 
, given by the second equation in (5.3.20), is a quadratic form using the 
inverse of its covariance matrix and is thus asymptotically chi-square with 
t e degrees of freedom equal to the rank of its matrix, namely q. This 
completes the proof. 
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The test of //o P| unspecified vmus //^ Pi unspecified, 

IS earned out as follows First find the reduced model R estimate of P, 
denoted by /!,(} or ^o“(4io ®) Porm the reduced model residuals F, 
- X, 4o ~ compute the g X 1 vector of rank statis- 
tics Si(Y - - Si(Y - X^ Then compute the quadratic form S* 

(5 3 20) and reject //o at approximate level a if S* > x2(9) where is 
the chi square critical value with q degrees of freedom Practical aspects of 
the computation of S* are discussed in the next section 

The vector SjfY - X^q) of rank statistics is sometimes called the vector 
of aligned rank statistics The ongina) data has been aligned or adjusted by 
removing the effects of the unspecified nuisance parameters p, before 
constructing the test for If the null hypothesis Hg is true then 

SjfY - X^o) = 0 Hence the idea behind the test based on S* is to assess 
the size of SjO^ - X^q) If S^fY - X^j) is large then we wish to reject 
Hg = 0 liie standard statistical method of assessing the size of a 
random vector which has a muiiivanate normal disinbution is to construct 
a quadratic form in the components of the vector using the covanance 
matnx This method yields statistics with cbi square dislnbutions so that 
critical values for the test can be determined This is precisely how we 
developed 5* 

Hodges and Lehmann (1962) introduced aligned rank tests Adiehie 
(1967a b) considered aligned rank tests m a simple linear regression model 
Adichie (1978) gives a proof of Theorem $ il for general score (unctions 
and does not require the reduced model estimate to be an R estimate He 
only requires that A''/\4io - 0iq) be bounded in probability Hence for 
certain models the statistic S* could be constructed using the reduced 
model least squares residuals this is discussed in the next section Adichie 
(1978) works directly with the first form of S* in (5 3 20) Sen and Pun 
(1977) studied the second form of S* in (5 320) They prove a multivanate 
version of Theorem 5 3 2 for general score functions m which they use R 
estimates of the nuisance parameters The papers of Adichie (1978) and Sen 
and Pun (1977) contain additional references to work on aligned rank tests 

S33 Tests Based on ^ 

The third approach to testing Hq 0^ = 0 0^ unspecified is based directly 
on the full model R estimate ^ determined by minimizing Z>(Y — X0) or 
solving the nonlinear system of equations S(Y - X0) = O It is convenient 
to use the formulation mentioned below (5 3 3) Let the qXp matnx 
H = (0 I) where I is the ^ x ^ identity matnx Then Hg 02 “ ® 
unspecified versus 0: ^ 0 0j imspecilied may be rcwnllen as 

Hg H0 = 0 versus H0 0 
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The test is based on HjS by constructing a quadratic form that has an 
asymptotic chi-square distribution. Exercise 5.5.7 provides the asymptotic 
distribution of — HjS). Using the asymptotic covariance matrix, 

the quadratic form 

VAr(H4 y[HS- 'H'] “ ' ViV (Hj3 ) 


where t is given by (5.2.18), has an asymptotic chi-square distribution with 
q degrees of freedom. 

Replacing S by and using t*, (5.3.5), the test statistic is 


B* = 


^ (H/3)'[H(X;X,)-'H']~ (Hj3) 


(T*)^ 


(5.3.23) 


Then i/o;^2 = 0, unspecified, is rejected at approximate level a if 

^ X^(9)> where x^(^) is the chi-square critical value. The statistic B* is 
called a Wald statistic after Abraham Wald who first proposed construct- 
ing quadratic forms in estimates using their asymptotic covariance matri- 
ces. 

This result is valid for general scores. The remarks following the proof of 
Theorem 5.3.1 are relevant for this case also. 

The three test statistics D*, (5.3.6); S*, (5.3.20); and B*, (5.3.23); have 
the same asymptotic distribution under the null hypothesis. Pitman effi- 
ciency is developed in the linear model by considering the limiting distribu- 
tions, as a sequence of alternatives converges to the null hypothesis, similar 
to the location models considered earlier. When the statistics have asymp- 
totic chi-square distributions, the efficiency is defined by the ratio of 
Roncentrality parameters. The results in the papers by McKean and Hett- 
mansperger (1976), Sen and Puri (1977), and Adichie (1978), show that for 
the three tests, the efficiency of any one of these tests relative to the 
least-squares F test is 


e(Rank,L5) = \2a^^j f\x)dx^ . 


(5.3.24) 


us the three tests are asymptotically equivalent in the sense of Pitman 
e iciency and they inherit the efficiency of the Wilcoxon signed rank test, 
j.u^ J^^””~^liltauy-Wilcoxon test, and the Kruskal-Wallis test. This, 
shows that in the two-way layout these rank tests recover the 
It Friedman test; see the discussion surrounding (4.3.8). 

^ s ould be recalled that the Friedman test does have certain attractive 
Ores such as being distribution free and not just asymptotically distribu- 
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tion free being easy to compute by hand and being useful in repealed 
measures designs 

As in the location models discussed earlier the rank tests and estimates 
have the same efficiency relative to their least squares counterparts Equa 
tion 5 2 17 the efficiency of the R estimate relative to the least squares 
estimate is the same as (5 3 24) 

Our preference among D* S* and B* is for D* The statistic D* and 
the rank estimate 4 hoth denve in a natural way from the data fitting 
cntenon Z){Y - XP) The even location free measure i)(Y - Xfi) (52i) 
1 $ a pseudonorm and provides a geometnc interpretation of the test and 
estimate that is similar to least squares See Exercise 5 S 10 Further 
discussion of the geometry of robust estimates and tests can be found in 
McKean and Schrader (1980) and lletimansperger and McKean (1983) 

The result stated in (5 3^4) also extends to general scores tests The 
discussion of Example 3 5 2 carries over from the two-sample location 
model to the general linear model 

In the next section we discuss the implementation of these tests Practical 
issues such as computation and adjustments needed for small to-moderate 
sample sizes are discussed The tests and estimates are illustrated on data 
ansing from regression and analysis of variance models 

Before turning to the examples we outline in some detail the implemen 
tation of the Wtosonzed Wilcoxon score function introduced m Sample 
2 8 2 for the one sample problem and extended to the two-sample case m 
Exercise 3 7 8 The results at the end of Section 2 9 suggest that Wmsonza 
tion 1 $ advantageous from the point of view of robustness see Example 
29 5 With the exception of occasional remarks this chapter has dealt 
solely with the Wilcoxon score function (5 2 9) We choose to discuss the 
Winsonzed Wilcoxon score function because estimates and tests based on it 
are available through the RANK REGRESSION command in the Minilab 
statistical computing system 

The solution to Exercise 3 7 8 can be wntten in the following standard 
ized form 


</>•(«)- 


-( 1 - T )/2 
u-l/2 
(1 - t )/2 


0< «< y/2 
y/l < « < 1 - y/2 
1 - r/2 < « < 1 


Further 


(S32S) 


(l-Trti+2y) 




12 
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A(u) = -\l p (5.3.26) 

Then fQ^\u)du = 1 and fl^(u)du = 0. From Example 2.9.1, the parameter 
t”' = I2^^^jf\x)dx is replaced by 

T-' =/i2 f'’f\x)dx/ J(1 - Y)"(l + 2y) (5.3.27) 

Ja ’ 


where a= f “'( 7 / 2 ) and b = F"'(l — y/2). Note that 7 represents the 
fraction of sign-type scores used in the Winsorization of the Wilcoxon 
scores. When 7 = 0 the equations reduce to the Wilcoxon case and when 7 
approaches 1 the equations yield the L, case. 

Now using a(i) = ^[i/{N + 1)] in (5.2.5), the dispersion function £)(•) is 
defined, and the corresponding R estimate /3 is found by minimizing 
fl(Y-X/3). Theorem 5.2.3 remains valid and — P) is asymptoti- 

cally A/FA^(0 ,t^ 2“‘). If we suppose that F E S2^, then the parameter t will 
be estimated consistently by N^^^L/2Z^/2 where 1 — $(Z„/ 2 ) = I — a and 
L is the length of the confidence interval constructed from the Winsorized 
Wilcoxon signed rank statistic applied to the residuals Y, — x'/3, / = 1, . . . , 
N. This approach is described in detail in Example 2.8.4. If we may only 
suppose that F E fig then a density estimation approach could be used. The 
Schweder (1975) type estimate of j^J'^(x)dx is given by 

/ (A))/;v (x) dFfj (x) = (X) dF^ (X) 

where a, b are the 7/2 and 1 — 7/2 sample percentiles, F^ is the empirical 
cdf, and is a window-type density estimate, all computed from the 
ull-model residuals. A rigorous proof of consistency, specifying all the 
necessary regularity conditions, has not been given for the density-type 
estimate m the linear-model case for general scores. 

For a specified value of 7 the RANK REGRESSION command in the 
initab statistical computing system will produce the R estimates and tests 
ased on either D*, (5.3.6), or B*, (5.3.23), using either the confidence 
density estimate of t. Except for the density estimate approach, 
c proofs for the results m the linear model with general scores can be 
in Jaeckel (1972) and McKean and Hettmansperger (1976). 
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The computations involved in determining a rank estimate or test m the 
linear model are generally quite tedious and are earned out using a 
computer From Theorem 3 2 I we know that />(Y — XP) is continuous 
and convex so that methods such as steepest descent can be used to locate 
the minimum An iterative method based on the idea m the construction of 
A; -step estimates, (5 2 19), is also useful Since the minimizing value is not 
always unique, the numerical answer will depend on which solution the 
algonthm picks If we have a single sample, this corresponds to taking any 
value between the middle two order statistics as the sample median when 
the sample size is even In addition, since iterative methods are used to find 
the minimum, the tolerances set to determine when to stop will have some 
effect on the numerical answers Discussion of the computational aspects of 
finding rank estimates and tests can be found in Aubuchon and Hett- 
mansperger (1982a) and Hclimanspergcr and McKean (1983) and their 
references For data analysis, the Minitab computing system provides a 
RANK REGRESSION command to compute R estimates and tests based 
on D* and B* Wticoxon and Winsonzed Wilcoxon scores are available 
along with either the confidence interval approach or the density estimate 
approach to estimating t 

In addition to the numerical issues there are statistical issues that 
require some attention Simulation studies by McKean and Hettmansperger 
(1978) and Hettmansperger and McKean (1977 1983) indicate that the 
nominal significance levels of (he rank tests are quite close to the true levels 
when the chi-square cntical point xi(?) is replaced by q times the F critical 
point N — p ~ 1) Typically, the chi square cntical value, denved 

from the asymptotic theory, allowed loo many rejections under the null 
hypothesis and inflated the significance level The F cntical value conected 
this problem quite well in all cases studied A bias correction is generally 
applied to the confidence interval estimate of t For example, t may be 
multiplied by [N/^N — p~ 1 )]*^*. a factor suggested by the least-squares 
estimate of a Bias correction of the density estimate approach to estimating 
r IS discussed m the construction of y* m (3 2 24) 

The following examples have been chosen from the literature where the 
reader can find different perspectives and other analyses We provide the 
least squares solution for purposes of comparison 

Example 5 41 This example is intended to illustrate the estimation of 
regression coefficients The data, given in Table 5 2, consists of 21 observa- 
tions on a response vanable Y and three independent variables X|, X 2 and 
X 3 relevant to the study of a manufactunng plant that oxidizes ammonia to 
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Table 5.2. Stack Loss Data 



Air 

Cooling Water 

Acid 

Stack 


Flow 

Inlet Temperature 

Concentration 

Loss 

Run No. 


J<;2 

^3 

Y 

1 

80 

27 

89 

42 

2 

80 

27 

88 

37 

3 

75 

25 

90 

37 

4 

62 

24 

87 

28 

5 

62 

22 

87 

18 

6 

62 

23 

87 

18 

7 

62 

24 

93 

19 

8 

62 

24 

93 

20 

9 

58 

23 

87 

15 

10 

58 

18 

80 

14 

11 

58 

18 

89 

14 

12 

58 

17 

88 

13 

13 

58 

18 

82 

11 

14 

58 

19 

93 

12 

15 

50 

18 

89 

8 

16 

50 

18 

86 

7 

17 

50 

19 

72 

8 

18 

50 

19 

79 

8 

19 

50 

20 

80 

9 

20 

56 

20 

82 

15 

21 

70 

20 

91 

15 


nitric acid. The variables are 

T = 10 times the percent ammonia lost up the stack in the process 
= rate of the operation, measured by air flow 
^2 = temperature of the cooling water used in the process 

•^3 = concentration of nitric acid in the absorbing liquid (coded by minus 
50, times 10). 


We will consider the equations T) = /So + ^ 

~ , 21. The dependent variable T is a measure of the inefficiency of 

the process. 

Least-squares analyses are given by Brownlee (1965, p. 454), Draper and 
mith (1981, p. 361), and Daniel and Wood (1971, Chapter 5). Daniel and 
2 after an elaborate analysis, set aside four observations — numbers 1, 
>4, and 21 — as outliers corresponding to transient states in the process, 
^variable was eliminated as nonsignificant. The analysis then pro- 
ed on the remaining 17 observations. 
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Figure 5 1 Lent squares plot of tcsiduaU veraui fitted valuei 


The least'squares analysts is provided by inputting the data from Table 
S 2 into the computer and regressing the Y column on the three vanables in 
columns 2 through 4 using the REGRESS command in Mimiab A plot of 
residuals versus fitted values is provided m Fig S 1 Note the four extreme 
residuals and the downward tilt paiiem in ibe remaining points 

Andrews (1974) using Af estimation and Hettmansperger and McKean 
(1977) using R estimation, while retaining the four spunous points, were 
able to achieve essentially the same fit as that produced by least squares 
without the four points 

In Table 5 3 we list the estimates of ^ 2 - Pi' ^ vanous 

score functions The target for comparison is the line corresponding to least 
squares without outliers Estimates of the standard deviations of the esli* 


Table 53 Eslimates of the Parameters 



/3o 

0, 

01 

k 

i 

Least squares 

-39 9 

072(0 17y 

1 30(037) 

-015(016) 


Least squares 

-376 

080(007) 

058(017) 

-0 07(006) 


(w/o outliers) 
Wilcoxon 

-401 

082(013) 

089(0 35) 

-012(015) 

306 

33% Winsonzcd 

-38 0 

085(010) 

072(0 28) 

-0 13(0 12) 

2 49 

50% Winsorized 

-38 5 

085(008) 

067(0 22) 

-011(0 09) 

200 

Sign (Li) 

-39 7 

083(006) 

058(0 14) 

- 0 06(0 06) 

0 89 

Andrews 

-37 2 

082^05) 

052(012) 

- 0 07(0 W) 



•Number m pareniheses is the «st mate of the mndard devuiion 
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mates are computed from t^(X'X^)~ *, suggested by Theorem 5.2.3. Since 
is an estimate of the efficiency of the rank method relative to the 
least-squares method, smaller values of t indicate that the data is selecting 
a more efficient score function. This informal view of letting the sample 
select a score function suggests that the L, fit and the resulting estimates 
corresponding to minimizing the sum of absolute values of the residuals, is 
the best among those considered. The estimates corresponding to the sign 
scores are closest to least squares without outliers. Denby and Mallows 
(1977) reach the same conclusion using diagnostic plots in robust regres- 
sion. 

Unfortunately, we cannot expect, in general, to have a least-squares 
analysis as sensitive as the one provided by Daniel and Wood. A reason- 
able strategy would be to use the Wilcoxon or Winsorized Wilcoxon score 
function, for example, the 33% Winsorized Wilcoxon suggested by Example 
2.9.5. If the results are in conflict with a standard least-squares analysis, 
then a deeper inspection of the experiment is in order to try to uncover the 
sources of the differences. 

The estimates of the standard deviations for the R estimates are based 
on the length of confidence interval method. This requires that we assume 
symmetry of the error distribution. In the case of Wilcoxon scores, for 
example, the confidence interval method yields f = 3.06. The density esti- 
mation approach yields f = 2.68. The estimates /8,, /?2. and ^3, and their 
distribution theory are not affected by the symmetry assumption. The 
intercept estimate jSg depends on whether we assume symmetry or not. 
Under the symmetry assumption, for the Wilcoxon scores, is the median 
of the Walsh averages of the full-model residuals; Pq= —40.10. Without 
the symmetry assumptions, /3g is simply the median of the residuals; 
1^0= -40.14. Recall the discussion surrounding (5.2.31) and (5.2.32). 

To test for the significance of /S3, which was eliminated in the least- 
squares analysis, we could compare the square of the ratio of the estimate 
and its standard deviation to the F critical value F„(l, 17). This is precisely 
the test based on B*, (5.3.23). For Wilcoxon scores, from Table 5.3, 

~ (~T2/.15)^ =.64 which is not significant for any reasonable signifi- 
cance level a. Again using Wilcoxon scores, but with the density estimate of 
T and the test based on Z)(Y - X/3), we find from the RANK REGRES- 
SION command in Minitab that D*, (5.3.6), is .81. In fact, a quick glance 
at Table 5.3 reveals that all of the rank score tests based on B* will fail to 
reject Hq : = 0 at any reasonable significance level. 

In summary, we would recommend an equation based on the Wilcoxon 
nr 33% Winsorized Wilcoxon score function. The respective equations are 


y = — 40.1 + 0.82x| -f 0.89x2 — 0.12x3 
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Figure S.2 33% Winsorued Wiktnoo plot of rcsiduaU venui filled values 


or 


ym -38 0 + 085x, + 072jrj-0l3*, 

We have not removed (he outhers nor dropped the xj term Figure S 2 
provides a plot of residuals versus fitted values for the 33% Wmsonzed 
Wilcoxon case Note the four outliers arc more prominent than the least- 
squares plot (Fig 5 1) and the remaining pattern is improved An equation 
based on sign scores is also attractive and perhaps preferable, based on 
estimated efficiency In any case, the fitted equation developed from 
PiY — XP) IS belter than the least-squares fit on the full data set Cheng 
and Hettmansperger (1983) discuss an alternate computational approach 
based on iteratively reweighled least squares The resulting estimates for 
this example are given there and arc very close to the numbers reported in 
Table 5 3 

Example 5 42 In this example we illustrate the analysis of data arising 
from a two-way layout The analysis parallels a two-way analysis of 
variance, and the least-squares analysis is provided for companson The 
data, from Box and Cox (1964) and given in Table 5 4, consists of the 
survival times of 48 animals exposed to three different poisons and four 
different treatments A 3 X 4 factorial design was used with four observa- 
tions per cell 

We wish to test for the presence of significant interaction and mam 
effects A plot of the cell medians. Fig 5 3, indicates all effects may be 
present The two most sinking features of the plot are (1) the second 
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Table 5.4. Survival Times (unit = 10 hours) 


Poison 


Treatment 


A 

B 

C 

D 

I 

0.31 

0.82 

0.43 

0.45 


0.45 

1.10 

0.45 

0.71 


0.46 

0.88 

0.63 

0.66 


0.43 

0.72 

0.76 

0.62 

II 

0.36 

0.92 

0.44 

0.56 


0.29 

0.61 

0.35 

1.02 


0.40 

0.49 

0.31 

0.71 


0.23 

1.24 

0.40 

0.38 

III 

0.22 

0.30 

0.23 

0.30 


0.21 

0.37 

0.25 

0.36 


0.18 

0.38 

0.24 

0.31 


0.23 

0.29 

0.22 

0.33 


treatment seems to be the most efficacious and ( 2 ) the third poison is 
almost uniformly lethal. 

Before describing the analysis, we must develop a linear model with a 
nonsingular design matrix for this example. We begin with yy, = n + 7>.+ 

+ + /= 1,2,3; y=l,2,3,4; Z:=l,2,3.4. The conditions 

~ 2 ^ = 2 /^(/ = 0 ^re usually imposed so the parameters can be 

identified. We have used P^,P 2 ,PJ and T,, Tj, 73, 74 to denote the poison 
and treatment effects, respectively. The term /y is an interaction term. We 
will outline the construction of the design matrix. This is illustrated for the 
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one way layout m Example 5 3 t The Y observations will be set in a 48 X 1 
vector m the order , where PiT, means 
data corresponding to the first poison and first treatment, and so on Using 
1 to denote a 4 X 1 vector of ones and D,y to denote the 4 x 1 vector of y 
values in the (i j) cell we can represent the first 16 observations as follows 

P, Pj Fj r, Ti Ty 
D„ 1 0 0 1 D 0 0 

D,2 1 0 0 0 1 0 0 

D,2 1 0 0 0 0 1 0 

D,4 1 0 0 0 0 0 1 


The pattern then continues for the remaining cells In order to convert the 
matrix to a nonsingular one and also account for the interaction terms, we 
first subtract the Py column from the P, and Fj columns and subtract the 
column from the T, Tj and T, columns We then have the starred 
columns in the following array 


Pt 77 
0 1 


D,2 1 0 

D„ 1 0 

D,4 1 0 


7? 

0 


i hi hi hi 

0 0 0 

0 0 0 

1 0 0 

10 0 0 


The Iij column is found by multiplying ilh F*-column by the yth T* column 
to form the interaction dummy vanabics The starred parameters are the 
simple contrasts, for example PJ » F, - Fj used to formulate hypotheses 
The entire data set consisting of a column of Y values and 1 1 columns of 
ones negative ones and zeros is imputed into the computer We can now 
specify various hypotheses of interest Let P “(Ff.FJ 77. TJ FJ 
hi’ hi) components Let the 6 X 11 matrix H, be given by 

H; = [0,1] where 1 is a 6 X 6 identity matnx Then Hq H,fi = 0 specifies 
the hypothesis of no interaction Let = (1 0] where I is the 2 X 2 
identity matrix Then Hq = 0 specifies the hypothesis of no poison 
effect Note lipp = 0 means Ff = 0 and FJ = 0 so that Ff = F, - Fj = 0 
and FJ = Fj - Fj = 0, and F, = Fj = Py The matrix Hr needed to specify 
the hypothesis of no treatment effect is left for the reader 

Minitab was used to make the computations Since survival time may be 
asymmetrically distributed the density estimate of t should be used 
Wilcoxon scores and the test statistic D*, (5 3 6) were applied In Table 5 5 
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Table 5.5. Analysis of the Survival Data 


Source 

df 

D*/df 

F.o5(df,36) 

F 

P 

2 

42.53 

3.26 

23.22 

T 

3 

22.85 

2.87 

13.81 

PXT 

T* = 0.084, d =.149 

6 

3.35 

2.36 

1.87 


we compare D*/df to F„(df,N — p — 1) with = 48 and p= II, as sug- 
gested at the beginning of this section. The column labeled F provides the 
classical F-test values. 

From Table 5.5, based on the rank test D*, we see a significant 
interaction term and highly significant main effects. On the other hand, the 
F test fails to reject, at the 5% level, the hypothesis of no interaction. In Fig. 
5.4 a plot of residuals versus fitted values reveals a problem in the data. 

The plot shows highly heterogeneous cell variances. Huber, in his 1972 
Wald lecture (see Huber, 1973), points out that “uncontrollable inhomo- 
geneity of variances among the [variables] and genuinely long-tailed error 
distributions have almost indistinguishable effects, both impairing the effi- 
ciency of the [least-squares] estimates.” 

The analysis based on the ranks of the residuals also depends upon the 
assumption of homogeneity of variances. Various authors have suggested 
transformations to correct the problem; see Box and Cox (1964) and Brown 
(1975). The reciprocal has been advocated and Table 5.6 provides the 
analysis for the transformed data. It should be noted that these rank test 

0.70 + 


§ 0.35+ 

! I 

3 

.■5 

— 

It 0.00+ 4 + 


-0.35+ 

+- 

0.20 


** * 
5* 2 
22 * 


* t 

n 

* 


* 


+ + + + 

0.35 0.50 0.65 0.80 

Predicted Value 

Figure 5.4. Plot of residuals versus fitted values with Wilcoxon scores. 


— + 
0.9S 



274 


niE LINEAR MODEL 


Table 5 6 Analysis of Reciprocals of Data 


Source 

rf/ 

D*/d/ 


F 

P 

2 

5652 

3 26 

72 56 

T 

3 

25 26 

2 87 

28 36 

PXT 

t'-0435,i>-490 

6 

145 

2 36 

i 09 


statistics m the linear model are not generally mvanant under monotone 
transformations, as is the case in the simple location model !n the location 
model, the data rather than the residuals is ranked Hence the rank-based 
tests can be responsive to transformations of the data 

The situation has improved The rank and least-squares analyses are 
compatible and Fig 5 S shows the cell variances to be less heterogeneous 
Our recommended analysis would use i)* based on the Wilcoxon score 
function with the density estimate, and performed on the reciprocals of the 
data 

We have desenbed the use of D*, the model-filtm| entenon, tn testing 
hypotheses The approach to data analysis, suggested by this example, is to 
compare the rank and least-squares analyses and look deeper into the data 
when they disagree In this case a reciprocal transformation of the data is 
suggested 

The aligned rank test 5* could also be used Provided a least-squares 
regression algorithm is available, no special programs, beyond the ability to 
rank a vector of numbers, are necessary To test ■ 0, no interac- 

1 eoi- 


- « « «( ( 



1.00 
ngiire 5A. 


. — + 4 4 4 + 

2.00 3.0« 4.00 S.OO S.OO 

Predicted Velue 

Plot of tcudu&lt venue tided values for rtcipcocals 
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tion, we first find the reduced-model least-squares residuals, say r^i, , 
Then compute a(Rj), . . . , where 




R. 

N+l 


1 

2 


and R, is the rank of r^, among ro„ . . . , The values . . . , a(Rfj) 

are then fed into a least-squares program and S* is the numerator sum of 
squares in the F-test statistic for no interaction. When this is done on the 
raw data we find S* = 14.77. The chi-squre critical value for 6 degrees of 
freedom is XosC^) “ 12.59 and we would reject the null hypothesis of no 
interaction. For the reciprocal data, S* = 7.39 and fails to reject at any 
reasonable level. Hence the analysis based on S* is similar to that based on 
D* displayed in Tables 5.5 and 5.6. If aligned rank tests for the main 
effects are to be computed, then new reduced-model residuals must be 
found in each case. 

If we are willing to assume an additive model (no interaction), then we 
have a two-way layout with four observations per cell. A test for poison or 
treatment effects can be carried out using an extension of Friedman’s test 
outlined in Exercise 4.5.10. The reader is asked to make the computations 
in Exercise 5.5.8. 

Other more complex designs can be analyzed in a similar way. It is only 
necessary to set up the dummy variables to produce a nonsingular design 
matrix for a multiway layout or input the appropriate design matrix in the 
case of regression. In a like manner, by combining dummy variables and 
regression variables, an analysis of covariance can be carried out. Schrader 
and McKean (1977) and McKean and Schrader (1982) discuss further 
aspects of robust analysis of variance. For a discussion of robust analysis of 
vanance based on an M-estimate approach, see Schrader and Hettmans- 
perger (1980). 


5.5. EXERCISES 

5.5.1. Consider the simple linear regression model introduced in Section 

5.1, y, = ^, + P 2 X, + e,, i = I, . . . , N. Assume that c,, . . . , are 

1.1. d. F G fig. Rather than use U, (5.1.4), to test FTq : ^^2 = 0 versus 

Ra- ^ 2 ^ 0. we could use Kendall’s t defined by (4.4.6). Equiva- 
lently, we may use the numerator S of t, defined in (4.4.5). If 
^1 < • • • < then S = — Y) is the test statistic and 
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the mean, variance, and limiting distribution under 
= 0 are given m Theorem 442 To denve the Hodges-Leh- 
mann estimate, define + 

and show that 

~2S*(fij)~N{N-l)/2 
Now show that the estimate is 


A 



FuithcT, a fl - a) 100% confidence interval for Pi is given by 

'^herc N^~N(N-\)/2. 5,„ < • < 5,^., 

are the ordered pairwise slopes and P(S‘ < A) ■ a/2 Finally, 
show that k can be approximated as 


, N(N-\) ^ fN{N-\){2N + 5) 

* 4 Z./J Y 72 

where I - flt/2 

5.5J1. When ^ is the true parameter value, show that ES(Y - XP) « 0, 
where SjCV - \P) is given by (5 2 7) fory = 1, ,p 

SS3. a Suppose we have the linear model specified by (S 2 I) Suppose 
Po denotes the true parameter value and S(Y — XPg) denotes 
the/; X 1 vector of rank statistics with yth component given by 
(5 2 7) If assumptions m (5 2 11) hold, show that 

S(y - xft,) Az-j/Kiv(o.2;) 


b Under the conditions of (aX show that 

1 1 ) 


where p is given by (5 2 14) 
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5.5.4. Suppose that — Pq) is bounded in probability. Then show 

that, under assumptions in (5.2.11), — /Sq), the one-step 

estimate defined in (5.2.19), is asymptotically MF7V(0 ,t^ 2“') 
where T= 1/(12^^^ jf\x)dx). Hint: Write 

V^(j8,-/3o)-T^S-'SOo) 

}jN 

= ViV(Po- |3o) + t^S-'[S(/3o) - S(/3o)]. 

Use the asymptotic linearity, (5.2.10), and the boundedness in 
probability of — Pq) to show the right side is o^(l). Then 

argue the asymptotic distribution of the second term on the right 
side. 

5.5.5. Consider the density estimator, (5.2.21). Show that for each fixed ^ 

Derive a similar formula for Var Then show f(y) 

and Var /v(y)-^0. Hence fni(y) is a consistent estimator of f(y). 

5.5.6. Suppose /;v(x) is given by (5.2.21). Show that ff^ix)dx = jf^(x) 
dEi^(x) where ^(jc) satisfies Definition 5.2.3 with window w*(x) 
= w * M’(x), the convolution density; see Definition A3 in the 
Appendix. 

5.5.7. Suppose P is the full-model rank estimate. Definition 5.2.2. Under 
assumptions in (5.2.1 1), argue that N '/^(Hj3 — Hj8) has an asymp- 
totic distribution which is multivariate normal. Find the covari- 
ance matrix of the limiting distribution. 

5-5.8. Using the data in Table 5.4 and assuming no interaction (additive 
model) use the extended Friedman test in Exercise 4.5.10 to test 
the null hypothesis of no treatment effect. Then test the null 
hypothesis of no poison effect. 

5.5.9. Gini’s mean difference measure of variability in X' = (X,, . . . , X„) 
is defined to be 
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a If X' = (X^,, .,XJ) IS a random sample from a normal 

distribution with varunce a\ then show that £|f(X)= a, so 
j^(X) IS an unbiased csiimate of the standard deviation 
b Show that if R, is the rank of X/, then 


ir(X)- 




1 ) ,?,(*' 




Hence, argue that the R estimate of ^ m the linear model 
minimizes Gini’s measure of vanability 

5.5.10. Consider fl(Z) defined by (5.23) Prove 
a For every Z in R^, D(X) > 0 

b For every Z,W in /l\£){Z + W)<£>{Z)+i>(W) 
c For every scalar ft, i)(iZ)" |hlZ)(Z) 
d Z>(Z) •» 0 if and only if Z, " “ 

The exercise shows that Z>(Z). with scores satisfying the conditions 
given above, is a pseudo- (or semi-) norm Hence the rank estimate 
0, Definition 5 2 2. is defined by the closest vector to the 
observation vector Y, in the subspace defined by the model Xfi 
where distance is defined by the pseudonorm f>(Y - X^) 

5.5.11. Exercue 5 5 10 shows that D* (5 3 6). is a natural lest statistic 
since it compares the distance from Y to the subspace speafied by 
the null hypothesis and the distance from Y to the subspace 
defined by the full model Hence D* is analogous to the least- 
squares F statistic which can be interpreted m terms of the 
reduction in sums of squares due to fitting the reduced and full 
models Analogy with the F statistic also suggests the test statistic 

0(Y-Xft,)-0(Y-X^) 


Give a heunslic argument to show N ’f5(Y — X^) converges in 
probability to /!f„x+(F(x))dF(x)= 6, where 4 >(u) is a general 
score function such that f^u)du •» 0, fif>\u)du = 1 Show that m 
the case of the Wilcoxon score function, (5 2 9), 5 is not equal to 
t/2 Hence, the lest based on D* is not an asymptotically size a 
test and D* is preferable 

5.5.12. In the proof of Theorem 523, show T= where 

o* 1 $ the minimum eigenvalue of £ 
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The Multivariate Location Model 


6.1. INTRODUCTION 

In this brief chapter we indicate how one- and two-sample univariate tests 
and estimates can be extended to the multivariate case. If the multivariate 
problem has p components, then we will be interested in the vector of 
univariate statistics computed on each of the p components. For example, 
we will consider the vector of sign statistics in the multivariate one-sample 
model presented in the next section. 

The /7-component vector of statistics is denoted by S' = (S'], . . . , 5^) 
and the statistic S, is centered so that ES, = 0 under the null hypothesis. 
Even though the statistics are distribution free in the univariate model, the 
vector is no longer distribution free in the multivariate model. The diffi- 
culty occurs in the <Zov{S„Sj) which depends on the joint distribution of 
the observations. The solution is to estimate the covariances and construct 
asymptotically distribution-free tests. Hence, the emphasis is on the asymp- 
totic distributions. 

It is possible to construct conditional permutation or randomization tests 
which are distribution free for finite samples. These tests are only practical 
for small samples and we do not discuss them in this text. For a discussion 
of conditional permutation distribution-free tests see Maritz (1981) for 
applied issues and Puri and Sen (1971) for theoretical issues. 

Our strategy for the construction of tests is as follows: 

!• Establish, under the null hypothesis, that is asymptotically 

A/FiV(0,V); 

2. find a consistent estimate V of V; 

3* then 5* = n~'S'(^“'S = S'(/ 7 '^“'S will be asymptotically x\p)- 
The test rejects the null hypothesis at approximate level a if 
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S* > upper a cntical value from a chi-square distribution 

with p degrees of freedom 

Estimates of location in the multivanate model are defined by the vector 
of univariate Hodges-Lehmann esbmates corresponding to the univanatc 
test statistics m S Confidence regions with approximate confidence coeffi- 
cients can also be constructed by inverting the test vectors Another 
approach to confidence regions is lo find the Hodges-Lehmann estimate 6, 
estimate the covariance matrix W of the asymptotic multivanate normal 
distnbution of then define an approximate (1 - a) 100% confi- 

dence ellipsoid by 


(0 (i(0-8)'>v-'(i!-e)<x;-.(rt) 

where x? the upper 1 - a percentile of the chi-square distnbution 

with p degrees of freedom 

It should be noted that the rank procedures are not invariant under 
rotations of the axes They are, however, scale and location invanant This 
IS in contrast to the methods based on the vector of component means 
which are invarient under nonsingufar transformations 


6 1. ONE-SAMPLE MULTIVARIATE LOCATION MODEL 

The sampling model is specified by n independent, identically distnbuted 
/>-vanate random vectors X,, . X„ each with the /i-vanate cdf f(I| — 

— 6.) We assume the /»-vanalc cdf F is absolutely continuous 
with absolutely continuous marginal cdfs f,(/, - B^), , F At^ - 9^) and 

absolutely continuous bivariate marginal cdfs F,j{t, 6,,tj ~ 9^< i<J 
= 1, ,p The location parameters Of. , 0^ arc the marginal medians 
and F), • ^ ^ be displayed in a /i X n array 

^11 ^11 ^ i . 


A-,, 

The ith column is the p vector X, and the yth row is a random sample of 
size n taken on the yth component This model anses when p measurements 
are taken on each of n subjects 
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We wish to test Hg-.d — Q versus H^:0 where 0' = (0,, . . . ,9^) and 
0' = (0, . . . , 0). The hypothesis Hq :6= B* can be transformed into Hq : 0 
= 0 by subtracting the given 0* from the observation vectors. Since we 
make no further structural assumptions on F(r,, . . . , 1^), the vector of sign 
statistics is appropriate for the construction of a test. We use a form of the 
sign statistic that has zero expectation under the null hypothesis. Hence, for 
1 = 1 ,... ,p, let 

5,. = 2 sgn2r„= #{X„ > 0) - #{X„ < 0), / = (6.2.1) 

r=I 

where sgn(x) = — 1 if x < 0, 0 if x = 0, and -f 1 if x > 0. We shall suppose 
that no observations equal 0 (true with probability one). The sign statistic, 
(1.2.1), is related to S, since #(X„ > 0) -I- #(A'„ < 0) = «, r = 1, . . . , «, 
and S, = 2#(X„ > 0) — see Exercise 1.8.12. The symmetric form is more 
convenient. 

Let S' = (Si , . . . , Sp), then the necessary asymptotic distribution theory 
is given in the following theorem. First, we need to introduce some 
additional notation. Let X' = (W,, . . . , A),) be a random vector with joint 
cdf F(.). Let p,/0,0) = P{X, < 0,Xj < 0) and p,/0, 1) = P{X, < 0,A) > 0), 
F/1,0) = P(X, >0,Xj< 0) and p,j(l, 1) = P(X, > 0,A) > 0). 

Theorem 6.2.1. Under the null hypothesis 0 = 0, 
^s4z~MFiV^(0,V), 

in 

where V = ((u^)), i,j=l,...,p and u„ = 1 

^ = Fy(0,0) -t- p,^(l, 1) - p,j(0, 1) - p,j(l,0). 


Proof. First, ES, = 0 and Var5, = n so that v„ = Next consider 
Cov(5',,5J = 2 F[sgnA),sgnA),] 

/- I 


= «[p,(0,0)-t-p,^(l,l)-p,/0,l)-p,^(l,0)], (6.2.2) 

hence is as given in the statement of the theorem. Theorem A12 in the 
Ppendix implies that the limiting distribution for is MVN(Q,V). 
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To construct the test, we need a consistent estimate of V. From {6 2 2), 
the law of large numbers implies that 




(6 2 3) 


Hence, define ^ by v„ = 1 and given by (d 2 3) Then, under the null 
hypothesis, V is a consistent estimate of V and 

S-* = S(nV)"‘S (6 2 4) 


IS asymptotically distributed as xHp) 

When ^ = 2, a particularly simple form for S* is available You are 
asked to show in Exercise 6 4 I that S* can be wrilien as 


«„+«„ JVo, + «„ 


(6 2 5) 


where S„ " #(A'„ < O.A’j, < 0). - #(A'„ > O.A'j, > 0), Afoi ■ #(^ 1 . 

< O.Xi, > 0) and ^,o ■ > 0 Xj, < 0), i - 1. n These quantities 

are conveniently displayed in a two way table 


X,. <0 >0 

<0 /Vni 

>0 Ar,o //„ 

The asymptotic disinbution of the vector of sample medians is devel 
oped in Exercise 6 4 4 

It may be appropnate to impose a symmetry assumption on the joint cdf 
F(r,, ,ip) If the joint pdf /(r„ satisfies 

("*) 

we say / is diagonally symmetric about 0 This results in symmelnc 
marginal densities /,, ,jJ and tfce location vector ff = C^i, ,^p) 
represents the centers of symmetry of the marginal distnbutions 
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Under this assumption, we will consider the vector of Wilcoxon signed 
rank statistics computed on the p components of the observation vectors. 
For i=\,. . . ,p, let 


T’.-i (6-2 '') 

/=! I 

where, as in (2.2.2), is the rank of among . . . , \X,X and let 
V = (r,, . . . , Tp). The following theorem provides the limiting distribution 
needed to develop a test of //q : = 0 versus H^:6 ^0. 

Theorem 6.2.2. Suppose F(r ,, . . . , t^) has a density /(r,, . . . , tp), diago- 
nally symmetric about 0. Then 

-L Z~MFA^^(0, V) 

•fit 


where 


Oy = 4 F,{u)Fj{v)dF,j{u,v) - 1. 
«/ — 00 


Proof. Write T, as 


r.= 


n-\- 1 


^Ks{X„)- 

l = \ 


n{n + 1) 


( 6 . 2 . 8 ) 


where T/ = '2.Ri,s{X,f) is the form studied in Example 2.5.5. Using the 
projection result in Example 2.5.5, define 


We^have used the fact that F, eO, so that 1 - F,(-x) = F,(a:). Let 
~ , Up) then n^'/^T — U converges in probability to O. Hence 

^ s limiting distribution of is the same as that of U. 
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First note that Var U, •= 4(1/12) = 1/3, since F^{X|^ is uniformly distnb- 
uted on (0, 1) Next 

Co,( - 1 


Hence the formulas for and are established The limiting multivariate 
normality of U follows from Theorem AI2 in the Appendix 


To construct the test we need a consistent estimate of V under the null 
hypothesis Note that from (6 2 8) and (2 2 4) 


Var 7, - 


4 n(n+l)(2/H-l) 

(n + I)* 24 


«(2"* I) „ 

6(rt+ I) “"5 


Hence we take 


^,,-iVarr.- 


2 «* 1 
6(n+ 1) 


(6 2 9) 


We next give a heuristic argument to show that ^ , defined by 

^ n ■*/' ■ ^ 

IS a consistent estimate of Let F^{x) be the empirical cdf computed on 
the absolute values of the ith component Hence we can write (6 2 10) as 


“ (n + !)■ n/-"!''- (l'l)SEnWsgii(l)'iF.j(J,() 


where F„^(s, 0 is the bivariate empincal cdf that assigns mass n ' to each 
pair {x„,Xj,), /= 1, ,n From the convergence in probability of the 
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_p p p ir + (|j|)i^+(|/|)sgn(j)sgn(l)i/F (j,r). 

00«/~00 

The nght side can be written 

J—oqJQ 

“Tf 

JQ J — CO 

+ f C F;{-s)F;(-t)dF,j{s,t). (6.2.11) 

Now recall f;{x) = /’(iZj < x) = F,(x) - /;(-x) = 2F,(x) - 1 when 
a: > 0, and F*{x) = 0 when >: < 0. Also F"*" ( — x) = 1 — 2F(x) when x < 0 
and f+(~x) = 0 when x > 0. Substituting into (6.2.11), multiplying the 
factors m each integrand, and recombining the integrals yields 


rr [‘>'^■^(0 - - 2 ^/ 0 + 1 ] ')• («- 2 . i 2 ) 

*/— 00 -'— 00 


Using ffF,(s)dF^(s,0 = fF,(s)fdF^(s,0 = fF,(s)dF,(s) = 1/2, (6.2.12) re- 
duces to Oy given in Theorem 6.2.2. 

Hence defining V = ((u,^)), i,j = I, ... ,p, with u„ in (6.2.9) and u,^ in 
(62.10), we have, under the null hypothesis, that 

r* = T'(«V)"'T (6.2.13) 

>s asymptotically distributed as x\p). 

Both S* and T* are easy to compute and T* is illustrated in the 
0 lowing example. Exercise 6.4.2 shows that the multivariate test may 
provide a distinct advantage over the naive strategy of carrying out uni- 
'anate tests separately on each component. 

f we assume that /(f, — 0,, . . . , — 9^) is the multivariate normal 

ensity, centered at 6' = (0,, . . . , 6^), with covariance matrix 2, then 
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Hotelling’s statistic is the basis for testing Hq 0 = 0 versus //^ 0 0 The 
statistic IS given by 

W^-nX't-'X (6214) 

where X' = (r,j_ . y.), 1 = ((o^)). r.y-1. ,p, with «}„ = (n- 

Hotelling’s test rejects //(, 0 = 0 at level a if 

H'^>[{n-\)p/{n-p)]F,{p.n-p) 

where - p) is the F critical point with p and n- p degrees of 

freedom 

Example 62 1 Ryan et al (1976, p 276) provide a table of measurements 
on male Peruvian Indians over 21 years old who were born at a high 
altitude and whose parents were also bom al a high altitude Table 6 1 gives 
systolic and diastolic blood pressures for 15 individuals We will suppose 
that the bivanate distnbution IS diagonally ^mmeinc about 0 ^i 

IS the center of the marginal distribution of systolic blood pressures and 
likewise for 0j We wish to test //o 0 - (120 80) versus 0'9*>(12O,8O), 
the standard values for healthy males over 21 in the United States Hence 
we suspect that the Indian blood pressure differs from its American 
counterpart. 


Table 6 1 Blood Pressure 


X, 

X,~ 120 

- 1201) x 

sgn(X, - 120) 

Xj 

Xj-io 

R(lXj-801)X 
sgnfATj - 80) 

170 

50 

IS 

76 

-4 

-35 

125 

5 

5 

75 

-5 

-5 

14S 

28 

14 

120 

40 

IS 

140 

20 

13 

78 

-2 

- li 

106 

- 14 

- 10 

72 

- 8 

-8 

lOS 

- 12 

- 8 

62 

- 12 

-10 5 

124 

4 

3 

70 

- 10 

-9 

134 

14 

10 

64 

- 16 

13 5 

116 

-4 

-3 

76 

-4 

-35 

114 

-6 

-65 

74 

-6 

-65 

118 

-2 

- 1 

68 

- 12 

-10 5 

138 

18 

12 

78 

-2 

- 1 5 

134 

14 

10 

86 

6 

6 5 

124 

4 

3 

64 

- 16 

- 13 5 

114 

-6 

-65 

66 

- 14 

- 12 
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Let X' = (X^,X' 2 ) where A", = systolic and X 2 = diastolic. From (6.2.9), 
6 =.323 and using the third and sixth columns in Table 6.1 with (6.2.10), 
= .0684. Now 



.216 

-.046 


-.046) 

.216/ 


and from (6.2.13), T* — 8.49. When this is compared to Xo2s(2) = 7.38, we 
are able to reject .fifo- = (120, 80) at approximate level a =.025. We 
conclude that, at this significance level, there is evidence that the Peruvian 
blood pressure differs from the American blood pressure. The point esti- 
mate of 0' = (01,02) is the pair of medians of the Walsh averages. We find 
6' = (126, 73). 

Further, from Table 6.1, we have X^ = 127.533, X 2 = 75.267, and 


^^(288.410 114.133) 
1 114.133 195.781 


If we assume the underlying bivariate distribution is bivariate normal, then 
Hotelling’s test is appropriate. We find, for 0^ = (120,80), 

H^ = n{X- 0o)'2"'(X - do) = 8.88. 

The .025 critical point for this test is [(/J — l)p/(.n — p)]F„(p,n — p) = 
(2.15)(4.965) = 10.694. Hotelling’s test fails to reject the null hypothesis at 
a = .025. It should be noted that the significant T* was refered to an 
approximate critical value based on the asymptotic distribution and not on 
the exact distribution. Further study of the approximating distribution of 
T* for small to moderate samples is needed. 

Since the covariances among the components of S and T depend on the 
underlying univariate and bivariate distributions, the multivariate exten- 
sions of the sign and Wilcoxon signed rank statistics are not distribution 
free under the null hypothesis. The tests just proposed use estimates of the 
needed marginal distributions, namely the empirical cdfs. We then have 
asymptotically distribution-free tests. This approach is quite general and 
can be applied to other score functions. Multivariate tests based on 
insorized Wilcoxon signed rank statistics are described and illustrated on 
u ata set by Utts and Hettmansperger (1980). General scores tests are 
developed by Puri and Sen (1971, Chapter 4). 

symptotic efficiency can be developed in a manner similar to the 
umvaria^e case. It is necessary to establish the limiting distribution of the 
'ector n ^ alternatives 6„ — n~' ^^6 converging to 0. 
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The result of the calculation is that the limiting distribution is altered only 
in the mean of the asymptotic distnbution Hence, referring to the notation 
in the introduction, has an asymptotic A/W{/t,V) distnbution 

along 0„ For example, given the vector of sign statistics S, 
converges to n‘=2(6,J^{Q), »<^(0)y. The limiting distnbution of the 

test statistic 5*. along 0,, will be noncentrai chi>square with p degrees of 
freedom, and noncentrality parameter p'V'p Since the asymptotic local 
power IS an increasing function of the noncentrality parameter, the effi- 
ciency of two such test statistics is taken to be the ratio of their noncentral- 
ity parameters Qickel (196S) provides an extensive analysis of the efficiency 
properties of the multivariate sign, Wilcoxon, and Hotelling tests He 
concludes that the former tests arc better than Hotellings test in the 
presence of gross errors, but they should be used with caution in situations 
where considerable degeneracy is present 

The vector of Hodges-Lehmann estimates derived from the vector of 
rank tests generally has a distnbulion that is asymptotically multivariate 
normal Estimation efficiency is defined in terms of the generalized van- 
ance which is the determinant of the asymptotic covariance matnx Hence, 
unlike the univanate case, testing and estimation efficiency have different 
formulas The efficiency results are quite similar, however See Bickel 
(1964) for a discussion of the estimation case Exercises 644 and 645 
develop the asymptotic distnbutions and provide some efficiency results for 
the estimators 


63. TWO-SAMPLE MULTIVARIATE LOCATION MODEL 

We will discuss the companson of two multivariate samples, size m and n. 
respectively Let X,, , X,, and Y,, . be samples from /i-vana(e 

distnbutions with cdfs FCu,, . and F(o, - - Ap, respec- 

tively The data can be displayed as 

Xlm ^11 


We will suppose that F is absolutely continuous with absolutely continuous 
marginal distributions No symmctiy assumption is needed The parameter 
A' = (A,, , Ap represents (he amount ol shift in each component in 

passing from the X distnbution to the Y distribution We wish to construct 
a test of Hq a »=0 versus A #0 
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The test that we will consider is a multivariate version of the Mann- 
Whitney-Wilcoxon test developed in Section 3.2. Let U' = (17,, . . . , U^), 
where 


U.= ^ 

t=\ 


R., 


N+\ 



(6.3.1) 


N = m + n. Hence U, is the centered rank sum statistic where R„ is the 
rank of Y„ in the combined samples of the ith component. Under Hq : 

= 0, EV = 0. The following theorem provides the limiting distribution of 
Af“72u under the null hypothesis. 


Theorem 63.1. Suppose the null hypothesis = 0 is true. Suppose 

m.n^oo and m/N^X, 0 <X < 1. Then 


1 


U-^Z~MFJV(0,V) 


where V = ((Oy)), i,j=\,...,p, and 


o„ = A(l-X)/12 

o,^ = X(l - 1/4|, i^j. 


Pi'oof. The argument is sketched because it is similar to that of Theorem 
62.2. We first note that 


R., 




^fN{N+ 1 ) 


W* 


where W* is defined in the proof of Theorem 3.2.4. Hence the projection is 
given by 




N 




Vw(w-{- 1) /=i 

where 2,„ / = ^ f (^u) and 

Ij ^ ( - n t = l, . . .,m 

‘ I m t = m + , N. 
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Since Fj(Z„) - J IS uniformly distnbuted on {- 1/2, 1/2) we have 




))■ 


]2N{N+iy 


Si?- 


I2\{M + 1)’ '2 


Further, 

Cov| V, 


'iVTf J '\l 


-.X(l ->,)(]■” 1/4) (632) 

The limiting multivanate nonnality of the projection follows from Theorem 
A12 or Theorem A13 with a, - t>,/(N + I) 

By the Projection Theorem 2 52 N '^U, has the same multivanate 
normal limiting distnbution and the proof is complete 


In order to construct a test of //^ A - 0 versus A 0 we first need 
a consistent estimate of V the asymptotic covariance matrix For v,, we will 
take 


If we substitute the single and bivanate marginal empmcal distnbution 
functions into (6 3 2), we will have a consistent estimate of The result, 
slightly modified, is 


e = — sm — ( V - ij 

Af^(yv-i)l,r, ^+1 w+i 4/ 


N\N-l){N 


{ N 

,?i 




N{f/+\y 


(6 3 4) 


Straight substitution in (6 3 2) results in rather than N(N - 1) Equation 
6 3 4 IS the appropriate covanance equation for the conditional joint 
distnbution of U, and Uj, given the combined sample observations Equa- 
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Control Group 

1: 1.21 0.92 0.80 0.85 0.98 1.15 1.10 1.02 1.18 1.09 

2: 0.61 0.43 0.35 0.48 0.42 0.52 0.50 0.53 0.45 0.40 


Treatment Group 

1: 1.40 1.17 1.23 1.19 1.38 1.17 1.31 1.30 1.22 1.00 1.12 1.09 

2: 0.50 0.39 0.44 0.37 0.42 0.45 0.41 0.47 0.29 0.30 0.27 0.35 


tion (6.3.4) is the appropriate equation for the finite sample permutation 
test. See Maritz (1981, Chapter 7). 

Now, using (6.3.3) and (6.3.4), the test statistic is 

U* = A^-'U'(V)"‘U = U'(A^V)"'U. (6.3.5) 

Under //g ; A = 0, U* is asymptotically chi-square with p degrees of free- 
dom. Hence we reject //g : A = 0, at approximate level a, if U* > x^(/’)- 

Example 63.1. This example is taken from Morrison (1976, p. 167). The 
data consists in the levels of two biochemical components found in the 
brains of mice. Twenty-four mice of the same strain were divided randomly 
into a treatment group (which received a drug) and a control group. It is 
hypothesized that the biochemical levels will be different for the two 
groups. Let A' = (A„ A^) be the difference in medians in the two bivariate 
populations. Two mice in the control group died. Hence we have a control 
sample X,, . . . , from and a treatment sample Y„ . . . , Y ,2 

from F(/, - A „/2 — A 2 ), and we wish to test = ^ versus : A ^ 0. 

Table 6.2 provides the amounts of the components in micrograms per gram 
of brain tissue. 

The data analysis proceeds by combining the first component of the 
control and treatment groups and recording the ranks. The same is done 
or the second component. Ties are assigned the average rank. Ranks are 
shown in Table 6.3. 

Table 63. Ranks of the Data 

Control Group 

1: 16 3 1 2 4 11 9 6 14 7.5 

2: 22 12 4.5 17 10.5 20 18.5 21 14.5 8 

Treatment Group 

1: 22 12.5 18 15 21 12.5 20 19 17 5 10 7.5 

J-- 18-5 7 13 6 10.5 14.5 9 16 2 3 1 4.5 
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We find from (6 3 1) that (/, - I 80 and {/^ ■» - 143 Further «5,, « Ojj 
= mn/[\2NiN + 1)1 *» 01976 and »|2 = 0029 Hence, with N ®= 22, 


(^V)-' = 


23523 

-3496 


- 3 496 \ 
2 3523^ 


and U* 14 22 A test at approximate level 05 compares 

to Xxi 5(2) = 5 99, hence U* easdy rejects //q A = 0 The Hodges- 
Lehmann estimate of A' is ( 1 9. - 08) see (3 2 5) 

Mantz (1981, p 243) carries out a lest using normal scores He finds a 
value of 15 00 which is also compared to x«(2) = 5 99 Hence the conclu- 
sion IS the same For an analysis based on the multivariate extension of 
Mood s test see Exercise 64 8 The normal theory test would be based on 
the two sample Hotelling statistic, see Momson (1967, Chapter 4) 


See Pun and Sen (1971 Chapter 5) fora developemeni of the theory for 
general score functions They also extend the results from iwo-sample to 
multisample multivanate location models Sen and Pun (1977) develop 
aligned rank tests, with general score functions in the multivanate linear 
model 


64 EXERCISES 

641. Verify that (6 2 5) can be dcnved from (6 2 4) 

64 2. Suppose that a large high school is concerned that emphasis is 
placed on the science program at the expense of the language 
program To test this hypothesis 100 juniors were given the SAT 
(Scholastic Aptitude Test) We have a pair of measurements ( V, Q) 
the verbal and quantiiaiivc parts of the SAT test The following 
table provides the numbers below and above the national average 
on the two components 


g 

V Below Above 

Below 34 22 

Above 8 36 


Let Sy = number of observations above the national verbal average 
(Oy) and let Sq = number of observations above the national quan- 
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titative average (0q). Use S* to test Hq-. 6' = {9y,6^ versus 6' 
¥=i0y,0Q) at approximate significance level a = .05. Then test each 
component separately at a = .05 using the sign test in Section 1 .2. 

6.43. Show that, in (6.2.2), p,j(0, 0) + p,j(l, 1) - /7y (0, 1) - /7y(l, 0) = 4/?,^ (0, 
0) - 1. Hint: Use the fact that the marginal medians are 0. 

A A 

6.4.4. Let 0 be /^-component vector of sample medians. Then S(6) = 0 
where S,(0) = 2"=iSgn(A'„ - 0,); see (6.2.1). 

a. Using Exercise 2.10.18, show that 



1 1 
2 /.( 0 ) ' ■ ■ ■ ■ 2 /,( 0 ) 


^S(») + »,(!) 

v« 


where we let the true value of 0 be 0 without loss of generality, 
diag(a,,a 2 > ■ • ■ , (ip) denotes a pX p diagonal matrix with ith 
diagonal element a,, and o^(l) is a /) X 1 vector that converges 
to 0 in probability. 

b. Let D denote the diagonal matrix in (a) and argue that 

^■0 ^ Z~ MVN(0, DVD') 

with V given by Theorem 6.2.1. 

c. For p = l show 


DVD' = 4 


where y = 4P(2r, < 0,^2 < 0) - 1. 

d. Suppose the bivariate central limit theorem applies and 

v/nx4z~MF7V(0,S) 

with variances ctJ, a| and correlation p. Then the efficiency of 0 
relative to X is the {\/p)ih root of the ratio of the asymptotic 
generalized variances. Recall the generalized variance is the 
determinant of the covariance matrix. Hence show that 


m 


/.( 0 )/ 2 ( 0 ) 

1 


/.( 0 )/ 2 ( 0 ) /|( 0 ) 


e(0,X) = 


1/2 
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e Suppose the underlying distnbution is bivariate normal Then 
show that the efficiency in (d) reduces to 


e(0,X)-^| ^ ^ 

’^(l-(4/ff*)[sin-'(p)] 


Plot e(^,X} as a function of p and discuss the efficiency Hint 
Use Exercise I 8 14 

6.4^. Let be the /I component vector m which §, is the median of the 
Walsh averages of the n observations in the ith component Then 
7'{tf) = 0 where 


W)-S ^sgn(jr.-e,) 


and is the tank of - $,\ among (X,, -• fl,}, , \X,„ - 

a Using Theorem 272 and assuming the true value of 8 is 0, 

without loss of generality, show that 

Hence, show 

n'/W - diag((2/,-)- . (2/;)- ') ^ T(0) + o,(l), 

where /,* *» and diagfaj, , a ) is a/> x/» diago- 

nal matrix with ilh diagonal element a, 
b Let D be the diagonal matrix in (a) Show that 

Vn 0 Z->MVN(0. DVD ) 

where V is given in Theorem 622 
c For p =» 2 show if DVD =((d^)), i,y = 1,2, then 

d„- * ; 1-1.2 

12[r./i(*)*] 
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6.4. EXERCISES 

A 

d. Show that the efficiency of d relative to X for/? = 2 is 


e(0,X) = 12(7,02 r“ ff(x)dx f*” fi{x)dx 

J — QO CO 


1-p^ 

1 - 95 ^ 


1/2 


Where 5 = 4f^J^^F,{x)F^(y)dF,,(x,y) - 1. 
e. Show that, if F ,2 is the bivariate normal distribution, 

l '/2 


e(0,X) = - 

V y ^ 


1-p" 


1 


(36/2r2)[sin-'(p/2)] 


Plot e(0,X) as a function of p and discuss the relative behavior 
of 0 and X. Hint: Use (2.7.9). 

6.4.6. For the data in Example 6.2.1, use sign scores to find the point 
estimate of 0 and test the hypothesis T/q: 0' = (120, 80) versus 
H^:Q' ^ (120, 80) where 0 is the vector of marginal medians. 

6.4.7. The exercise extends Mood’s test. Example 3.4.2, to the multivariate 
two-sample location model described at the beginning of Section 
6.3. We consider testing Hq:^ = Q versus : A 0. Let, for 
i=\,...,p, M, = #(F„ > C,) - n/2, t = I, . . . , n, where C, is 
the median of the combined data in the /th coordinate. Hence M, is 
the centered Mood’s statistic. Let M' = (M,, . . . , M^) and show 
that, if m,n^ CO so that m/N~>X, 0 <X < I, N = m + n, then 

M' ^ (0, V) 


where V = ((o,^)), i,j=\,...,p, and 
v„ = A(1 - X)/4 

v,j = X(l-X){P{Y,>0,Yj>0)-i]. 
Further, argue that v„ = mn/[4n{N — 1)] and 

N 


6 = m 

N{N - 1) 


f 1 

\N-l 4 

(where iV,, is the number of pairs, in the /th and yth components, in 
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the combined sample that are both positive) are consistent esti 
mates of c and Vj respectively Finally argue that A/* = 
M{N^ 'M IS asymptotically chi square with p degrees of free 
dom 

64 8 Based on the results of Exercise 6 4 7 cany out the test of hypolhe 
SIS in Example 6 2 I Find the estimate of d based on M 
64i> Let i be the Ilodges-Lehmann estimate of A based on U (6 3 I) 
Suppose m n-» eo m such a way that m/S~*\, 0 < X < 1 N = 
m+ n Without loss of generality let A » 0 and show that 

V??A °Z-A/FA/{0 [A(l -X)] 'DVD) 
where DVD is described m Exercise 6 4 S(b) 
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Some Results from Mathematics 
and Mathematical Statistics 


In this appendix we list many of the results from asymptotic or limiting 
distribution theory that are used throughout the book. We either provide an 
argument for the results or give a reference where a discussion can be 
found. The asymptotic theory is important for two reasons. First, the 
distribution of test statistics and estimates can be approximated for practi- 
cal purposes and second, the asymptotic distributions of test statistics and 
estimates provide the basis for large sample power and efficiency compari- 
sons. Other mathematical results, such as a short discussion of Stieltjes 
integrals, are included. 

We first define the two main types of limiting behavior for sequences of 
random variables. We will denote a sequence of random variables Z„ 
Z 2 .-.. by {Z„}. 

Definition Al. A sequence of random variables {Z„) converges in proba- 
bility to a constant c if for every e > 0, limP(jZ„ — c] > e) = 0. This is 

denoted by Z„-> c and we say Z„ converges in probability to c. Note that c 
niay be replaced by a random variable Z in the definition. In this case we 
say Z„ converges in probability to Z. 

Definition A2. A sequence of random variables { Z„ ) converges in distri- 
ution to the random variable Z if lim F^{t) — F{t), at continuity points of 
’ ’^bere and F are the cumulative distribution functions (cdfs) of Z„ 
and Z respectively. We write Z„4 Z. 

A random variable Z is degenerate at the constant c if P(Z = c) = 1. 
Definition A2 contains Definition Al; since Z„-^Z, Z degenerate at c if 
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and only if Z,-»c This is discussed by Hogg and Craig (1978, p 186) in 
their treatment of stochastic convergence However, since convergence in 
probability and convergence in distnbution are very different types of 
behavior, it is convenient to keep the definitions separate 

The standard normal distribution has mean 0 and vanance 1, denoted 
by n{0, 1) The cdf will be denoted by ♦(/) Often, in Definition A2, Z will 
have a n(0, 1) distnbution, denoted Z— /i(0, 1) and we say Z„ is asymptoti- 
cally (or limiting) nfO 1) 

The first theorem strengthens convergence in distnbution for cases such 
as normal limiting distnbutions 


Theorem A1 (Polya) Suppose in Definition A2 that Z„-^Z and Z has a 
continuous cdf, then /),(0“* ^(0 uniformly in » Hence sup,|F,(0 - f(/)| 
-»0asB-*o3 This IS proved by Paizen (1960, p 438) 

Theorem A2. Suppose g(x) is a continuous function 

a If Z,-^e theng(Z*)-^g(c) 

b irZ.-5zihens(Z,)-5j(Z) 

For a proof that includes the case of random vectors as well as random 
vanables sec Serfling (1980, p 24) 

The next result, due to Slutsky combines convergence in probability and 
distnbution It will be used extensively throughout the test 

Theorem A3 (Slutsky) Suppose Z,-^Z and a finite constant 

Then 

a, Z„ + T, -> Z + c 

b Z„T„-^cZ and if c = 0 then Z^T^-^O 
c Z„/ Y^^Z/c provided r v*0 

For a proof (which covers the case of random vectors also), see Serfling 
(1980 p 19) Note that T, may be degenerate at c„, so Y„-*c may be 
replaced by c„->c in the theorem 
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We now turn to the problem of establishing criteria for determining 
convergence in probability and convergence in distribution. 

Theorem A4. (Chebyshev). For any random variable Z and any positive 
constant c, 


P{\Z\> c)<^ . 

This is usually proved in introductory texts in mathematical statistics, for 
example, Hogg and Craig (1978, p. 58). 

Theorem A5. For the sequence of random variables {Z„), suppose 
£'Z„->c and VarZ„^0. Then Z„-^c. 

Proof. From Theorem A4, P(\Z„ — c| > c) < E{Z„ — cf But E{Z„ — 
cf = £(Z„ - £Z„ + EZ„ - cf = Var Z„ + [£Z„ - cf 0. Hence Z„ 4 c 
by Definition Al. Note this theorem covers the case £’Z„ = p and VarZ„ 
^0. 

Suppose Zi.Zj, . . . are independent and identically distributed (i.i.d.) 
random vanables with EX, = /a and Var A, = < oo. Then a4/x. This 

follows at once from Theorem A5 with Z„ = X and is known as the Weak 
Law of Large Numbers. Much stronger results are possible. For example 
the Strong Law of Large Numbers asserts that X converges to ju with 
probability 1 provided only EX, = fi < co. See Serfling (1980, p. 27) for a 
discussion. 

We now turn to convergence in distribution. The major tool is the 
Central Limit Theorem. We give the most general form of this theorem and 
then specialize it to various types of situations that are encountered in the 
text. We use /(•) to denote the indicator function 1(A) = 1 if A occurs and 
0 othenvise. 

Theorem A6. (Lindeberg Central Limit Theorem). For each n let Z, , 
■••.Z^be independent random variables with EZ,„ = 0 and VarZ,„ == 

< CO for i = 1 , . . . ^ Let jf for each e > 0, 
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then 



.( 0 , 1 ) 


This theorem is proved by Chung (1968, p 187) The result is quite 
general and in fact the condiuon is necessary as well as sufficient The 
double subsenpts generate a triangular array of random variables and the 
independence is only assumed within the rows of the array, that is, for 
Z]., ,Z„ Also note that the random variables do not need to be 

identically distnbuted 

We state neat a very important theorem from real analysis that provides 
conditions under which limits and denvatives can be passed through 
integrah In its statistical form it allows interchange of differentiation and 
expectation We illustrate the theorem by deriving the usual central limit 
theorem from Theorem A6 

Theorem A7. (Lebesque Dominated Convergence Theorem) Suppose 
A(x) IS an integrable function and <g,(x)) is a sequence of (measurable) 
functions such that lf,(x)| < h{x) and bm |,(x) “ |(x) Then hm 
dx ■ J^hm g^(x)\dx ■ /|{x)<£* For a proof see Royden (1968, p 229) In 
Its siattsucal version we have a random variable X and a sequence of 
functions {g,{x)} such that j,(x)-»g(x) We also have a function h{x) 
such that lg,(x)i < /i(x) for all n and £‘|A(^)l<eo Then 1imiEg,(,Sr) 
= Eg(X) 

Theorem A8. (Centra! Limit Theorem) 

a. Suppose X^,Xi, are iid random variables with and 

VarX, »» 0 ^, 0 < < CO Then S(X, - Z— n(0.1) 

b. Suppose X,,X]. are iid /Miomponent random vectors with 
EX, = fi and covariance mainx S, positive definite Then 


where %j is the mean of the yth component We will derive (a) from 
Theorem A6, see the remarks following All for (b) 
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Proof, a. In Theorem A6, let Z,„ = {X, — p) i - , n, then £Z,„ = 0, 

VarZ,„ = < 00 and — na^. Without loss of generality let p = 0. Now 

-®n '=1 

In Theorem A7, let g„(x) = > eo(n)'^^) and let h(x) = x\ Then 

E\h{X)\ = EX^ < cx) and g„(x)->0; hence EX}l(\X^\> E{Q) 

= 0 and the Lindeberg condition is satisfied. 

There are various ways to write the conclusion of the Central Limit 
Theorem. Often in statistics we deal with averages, X. When X^, ■ ■ ■ , X„ 
are i.i.d. with mean p and finite variance a^, we conclude from Theorem A8 
that n^^\X - p)-^ Z~n(0,a^), and we may say n'^^X — p) is asymptoti- 
cally (or limiting) n(0,<T^). This means for large n we can make the 
following approximation: 


or 



=^$(0 


P(X < x) 




fn(x- p) 


These approximtions can then be used to approximate critical values of 
tests based on X. 

We next discuss some additional central limit type results that are useful 
in nonparametrics. 


Theorem A9. Suppose WjW 2 , ... are i.i.d. with EW, = 0 and VarW, 
= 0 < < 00 . Define S = If 


max|a,| 

' ^ 0 . 


4en S/(VarS)'/2^Z~n(0, 1) with VarS" = 
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Proof In Theorem A6 define a,Wi/n*^^ then EZ,„^0, VarZ„ 
= afo^/n < 00 , and Lindeberg condition reduces to 

consideration of 






\m> 


eg 


tn3xj<z,| 


in'/ Jit'll > eg 


inax]a,| 


From the hypothesis of the theorem, (2)c*V'^'/niax|<»,|-»oo and now the 
argument is identical to that given m the proof of Theorem AS. Hence the 
Lindeberg condition holds and the proof is complete 


A simple variation on this theorem is quite useful in identifying the 
asymptotic parameters The proof is adapted from the proof of a lemma by 
Arnold (1980) 


Theorem AlO Suppose IV,, irj, are nd with £11', **0 and Varll', 
"o^0<a*<oo Definc5*=27-i«.»>'i/«'‘'' ff 


” /-I 


then 54z~n(0,oV’) and 5/(VarS)''*-Sz-/i(0, 1) 
Proof First note that, as n-»co. 




Hence as /i->oo 

The proof lollows Irom Theorem provided Sup- 

pose n ' '''^max|a,| does not lend to zero Then there exists an e > 0 such 
that for any integer N there exists an inle^r > N such lhatn,f’^^maxlg,l 
> e for I < I < By letting AT** 1,2, it is possible to construct a 
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sequence /i, < /J 2 < • • • of such values. For n, let m, < n, be such that 
ja^l = maxlfl,l, 1 < i < Now 




max 

1< I < n, 


which contradicts the first paragraph. Hence n '/^max|u,|^0 and 
S'/(Var5)''^^-^ Z~n(0, 1). Further, Yax S = a^'^af/n^ so S->Z~ 
n{0,a\^) by Slutsky’s Theorem A3 and the proof is complete. 


We will need multivariate versions of some of the above limit theorems. 
These multivariate extensions are often proved using the “Cramer-Wold 
device.” 


Theorem All. A sequence of random vectors {Z„} converges in distribu- 
tion to Z if and only if the sequence of (univariate) random variables 
{A'Z„} converges in distribution to A'Z for every vector of constants A such 
that = 1. See Serf ling (1980, p. 18) for a proof. 

The multivariate central limit theorem. Theorem A8(b), follows from the 
univanate case with an application of Theorem All. We now consider 
multivariate versions of Theorem A9. 


Tlieorem A12. Suppose IF,, W 2 , . . ■ are i.i.d. with EW, = 0 and Var W, 
= 0 < < 00 . Define 5^ = for ^ ~ ...,/». Let 


where 


If 


S = 










w = 


and A = 

I 








iA'A->A, 

n 


positive definite 


then s4z~A/FA(0,a2A). 



Proof Let X be a />-vcctor of length 1 and consider T « X'S 
= X'A'^V/«'''* Let C = AX so r** ^C,W,/n*^ and we need to show that 
C satisfies the condition m Theorem AlO 

Note that, since A is positive definite, = /i"*X'A'AX->X'AX 

> 0, so we can take = X'AX > 0 Hence X'S-»Z— n(0,o^X'AX) and the 
theorem follows from Theorem A1 1 

The next theorem is a slight variation on Theorem A12 that is sometimes 
useful 

Theorem A13. Suppose the same setup as in Theorem Al2 If the covari- 
ance matnx of S converges to a posiuve definite matnx, then S has an 
asymptotic multivariate normal distnbulion 

The proof is immediate upon writing out the Cov(S) 

As an example of the application of Theorem A12 we consider the 
asymptotic distnbution of the vector of estimates of the regression coeffi- 
cients m the linear model Let Y ~ -f e where Y is an n x 1 vector of 
observations, X an n x design matru of full rank ^ a x 1 vector of 
regression coefficients and e an n x I sector of 1 1 d random variables We 
sup;>ose £‘e,«0, Vafe,«o^ 0<o*<co Finally, suppose n’'X'X-»S, 
positive definite 

The least squares (maximum likelihood under normality) estimate of ^ is 
^ - (X'X)' 'X'Y This can be written as follows ^ 

^ - (X'X)"'X (X^ + e) - p + (X'X)''X'e 
Hence, identifying with the notation of Theorem A 1 2, we have 

VJ(4-P)-VS<X-X) 'X-t-iAa, 

Vn 

where A' = (n“*X'X)“'X' Then it follows that n'’A A =» nCXTC)"'-*!)”' 
and the condition of Theorem A!2 is satisfied 

Hence Theorem A12 applies and “P) ts asymptotically MVNifi, 

o^L“’) or ^ IS approximately XfFAf(p.o*(X'X)”') See Huber (1981, Sec- 
tion 12) for an alternative approach 

The Central Limit Theorem A8 for 1 1 d random variables with finite 
vanance is the most common version discussed in statistics tests and 
courses It says that ?(n*^^(Y— n)/o < /)-»4*(/), where EXT, = fi and 
Var AT, = 0 < < eo We also know from Polya’s Theorem A1 that the 
convergence is uniform m i Wc cannot tell at this point, however, how fast 
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the convergence is taking place. With information on the third absolute 
moment, the rate can be established. 

Theorem A14. (Berry-Esseen). Suppose X„X 2 , . . . are i.i.d. with EX, 
= n, VarZ, = 0 < < 00 and E\X, - jap = p^ 0 < < oo. Then for 

all n 




cp 


3 


■{n 


where c is a numerical constant independent of n. This is proved by Chung 
(1968, p. 206), who gives a numerical value for c of approximately 28. 
Serfling (1980, p. 33) points out that the value of c has been sharpened 
to .7975. 


This theorem is useful for establishing uniform convergence over a class 
of underlying distributions. As an application of Theorems A1 and A14 we 
have: 


Theorem A15. Suppose A|,A 2 , . . . are i.i.d. with cdf Fg(x), 6 EQ,. Sup- 
pose there exist constants K and M such that 0 < AT < a| < oo and pg < M 
< 00 for all 6 eQi. Then 


Po 



< 


converges to $(t) uniformly in 0 G as well as in t. 

The next theorem describes conditions under which the conclusion of 
the Central Limit Theorem continues to hold when . . . are no 

longer independent. The sequence AjjAj, ... is called stationary if the 
joint distribution of (A,, A, A, +,.) is independent of / for all r. The 
sequence is said to be m-dependent if (A„ . . . , A,.) and (Aj,Aj+i, . . . ) are 
independent provided s — r > m. The following theorem is a special case of 
a more general theorem due to Hoeffding and Robbins (1948). For further 
discussion see Fraser (1957, Section 6.4) and Serfling (1968a). 

Tlieorem A16. Suppose A,,A 2 , ... is a stationary, m-dependent se- 
quence with £A, = ja and £|A,p < oo. Then n'^^X - ii)-% Z-^n{0,a^) 
with = VarA^ + 22 :L,Cov(A„A,+,). 
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Throughout the book the concept of an absolutely continuous function 
IS important A function /'(jr) is absolutely continuous on the real line, 
provided there is an integrable function /(x) such that for every x<y 
Fiy) ~ ft then follows that F(x) has a denvative equal to 

f(x) (almost everywhere) ffence F{x) is the indefinite integral of its 
derivative When F(x) is an absolutely continuous cumulative dislnbution 
function, the function f(x) is the probability density function (pdO 

Definition A3. The convolution of any two cdfs, Fj(x) and F 2 (x) is 
defined by /"i • + ATj, 
where A’j.Afj are independent random variables with cdfs F,(x) and Fjfx), 
respectively The following theorem is important m the analysis of proper- 
ties of the Wilcoxon rank procedures See Definition A4 for the Stieltjes 
integral 

Theorem A17. Suppose F,(x) and Fjfx) arc absolutely continuous cdfs 
with pdfs /,(x) and /jfx), respectively Then the convolution F|*Fj(x) is 
also absolutely continuous with density /, ‘/jfx) - - ^)fjiy)dy A 

proof is given by Chung (1968, p 135) The theorem allows differentiation 
under the integral that defines the convolution Hence, if F*(x) » P(Xj 4- 
Xi < x) where X„Xj are 1 1 d with absolutely continuous F(x). then 

„ A nx) - £ J_* F(x - y)f{y)dy 


->•)/(>■) 

For further discussion of this integral and the Wilcoxon rank procedures 
see Mehra and Sarangi (l967)or Olshen (1967) It i$ not necessary that the 
random vanablcs be identically dislnbuted to define the convolution The 
proof of Theorem 4 42 contains an application of convolutions for discrete, 
not identically distnbuted random variables 

The Lebesque Dominated Convergence Theorem A7 has been used to 
exchange limits and integrals in Theorems A8 and A9 Another important 
application of Theorem A7 is the exchange of differentiation and integra- 
tion Statistical applications involve exchange of differentiation and expec- 
tation 

Theorem A18. Suppose the function h(x,0) is absolutely continuous as a 
function o{ 0 sil, an open interval Suppose {d/S9)h{x,0) exists and is 
bounded by ^(x) for all ^ efl Suppose j{d/d9)h{x,6)dx and fg(x)dx 
exist Then(d/d9)fh(x,9)dx=^ f(d/d0)h(x,9)dx 



SOME RESULTS FROM MATHEMATICS AND MATHEMATICAL STATISTICS 307 

In the statistical version of this theorem we suppose E{^/d0)h{X,0) 
and Eq(X) both exist. Then Theorem 18 implies that {d/d9)Eh(X,9) 
= E(!d/d9)h(X,9). 

Two other important inequalities from mathematics are important in 
statistics. 

Theorem A19. (Cauchy-Schwarz Inequality). Suppose g-Cx) and h{x) are 
square integrable functions. Then (j g(x)h(x) dx)^ < jg\x)dx jh\x)dx. 
Equality holds if and only if there exist constants a and b such that 
ag{x) + bh{x) = 0. In the statistical version we have two random variables 
X and Y such that EX^ < oo and EY^ < co. Then {EXY)^ < EX^EY^. 
Equality holds if and only if aX -Y bY — 0 with probability one. The 
statistical version is sometimes called the correlation inequality, since if X 
and Y have been standardized to have means 0 and variances 1, the 
Cauchy-Schwarz inequality is the same as < 1, where p is the correlation 
between X and Y. 

Next, recall that a function q{x) is convex if, for each X £ (0, 1), 
g(Xx + (1 - X)y) < Xg{x) + (1 - X)g(y). 

Theorem A20. (Jensen’s Inequality). Suppose X and g{X) have finite 
expectations. Suppose g{x) is convex. Then Eg(X) > g{EX). If g{x) is 
stnctly convex, then Eg{X) > g{EX), unless X is degenerate. See Fraser 
(1957, Section 2.2.). The most common example is g{x) = x^, so that 
EX^ > (EXf and Var A > 0. 

We now discuss the Stieltjes integral which is convenient for defining the 
expectation of a random variable. 

Definition A4. Let g(x:) and h{x) be given with g{x) normegative, and let 
a = Xo<x,<-- - <x„ = 6bea partition of [a, b]. The Stieltjes integral of 
six) with respect to /i(x) is defined by 

(‘’gix)dh(x) = lim 2 g{x,)[h{x,) - /i(x,_,)] 

,=i 

where the limit is taken over partitions with max(x, — x,_,)-^0. Stieltjes 
integrals over the real line are defined by letting a — oo and/or 6 -> + oo. 
We will say the Stieltjes integral for an arbitrary function fix) relative to 
Kx) exists if f\fix)\dhix) < oo. 

A simple example is provided by the distribution function that assigns 
mass 1 to the point y. Let 5^(x) = 1 if x > y and 0 otherwise. Then, when 
g(^) is continuous, /g(x)rf5^(x) = g(y). This follows immediately from the 
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definition because Sy(x,) - fy(x,_,)= I only when ^ and g(X;) 

~*g(/) as X/-*^ This IS simply a formal way of showing that if X is 
degenerate at>’ and if g(x) is continuous, then £g(X) = g(/) 

Suppose F(x) IS a discrete distribution function with jumps at Xg, 
Xj, , then, when the expectauon exists, we have £g(X)**i:g(xJ 
[£(x,) - f(Xg_,)] = /g(x)<f£(x) For example, let £„(x) denote the empin- 
cal distribution function (/),(x) puts mass l/n at each of the observed 
values Xj.Xj, , x„], then /x<#F,(x)®' 2xJF,(x,) - /],(X;_i)] 2x,(l/n) 
X, the sample mean When the distribution function F(x) has a density 
/(x) - £'(x). we have £g(X) fg(x)/(x)dx - fg(x)d£(x). when it exists 
Hence, using the Stieltjes integral, we do not need to separate the discrete 
and continuous cases to define expectation In general, if X has cdf F(x), 
discrete or continuous, then £g(X)^ /g(x)/fF(x), when it exists As 
pointed out previously, this reduces to the correct eciuations for the individ* 
ual cases 

Most of the properties of the Ricmann integral extend to the Slicltjes 
integral Integration by pans is often useful and the formula is given by 

Cramir (1946, Chapter 7) contains a development of Stieltjes integrals 
Summation formulas for integers appear frequently in the computations 
of moments for rank sums We list several of the most common equations 

Theorem All. 


X'-'-(n+l)/2 

f,’-f,(n+l)(2«+l)/6 
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Proof. We will describe a device that can be used to develop the formulas 
for any k. Note that 

n+ 1 n 

2 S /*= (« + 1)^- 1. 

1=2 i=l 

Make the change of variable j = i — \ in the first sum on the left. Then 
27=2'* = 2J= lO + 0*- Combining this with the second sum we have 

7=1 7=1 7 = 1 ■' 

= («+ 1 )^- 1 . 

Now [(J + 1)* -j''] is a polynomial in j of degree k - 1 and which, when 
summed on j, allows us to solve for 2"=iy *^' terms of prior sums. 

For example, let k = 2, then S"=ii[(/ + 0^ -/] = Sy=iC2/ + 1) = 
22!7 =i 7 + n = (« + 1)^ - 1. This yields 2”=i7 = «(« + l)/2. If we let 
/: = 3, then 32y=iy^ + 327=i7 + ” = + 3n can be solved for 

2 ;=,/. 
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Asymptotic local power, 
sample size required for Wilcoxon signed 
rank statistic, 64 
sign test, 125 

Wilcoxon signed rank test, 61 
Asymptotic maximin rank test, 120 
Asymptotic minimax estimator, 121 
Asymptotic relative efficiency; 
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Asymptotic size of a test, 6 
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Berry-Esseen theorem, 305 
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Bonferroni’s inequality, 186 
Breakdown, see Tolerance 

Cauchy-Schwarz inequality, 307 
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Completely randomized design, 168 
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15 
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Consistency of test, 7 
Contamination model, 117 
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Cram6r-Wold device, 303 

Density estimator, 244 
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Dispersion; 
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Durbin’s test, 219 
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efficiency 

Fisher information 104 
upper bound on efficacy 108 
Fnedman I test 197 
asymptotic efficiency relative to F lest 
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Oilion I statistic 
one sample model 40 
two-sample model 171 
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Generalized variance 242 288 
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efficacy 103 
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efficacy 165 
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properties under the null hypothesis 
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Xenseo s inequality 307 
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partitioned model, 251 

rank test based on full model residuals, 252 
simple regression model, 224-231 
test based on R-estimators, the Wald test, 
263 

ttto-way layout, 270 

Locally most powerful rank test (UvIPRT): 
approximate score function, 147 
asimptotic distribution, 149 
one-sample model, 167 
scale model. Savage test, 176-177 
two-sample model, 144-145 
Location functional, 20 
general scores Hodges-Lehmarm estimator, 
99 

Mann’s test for trend, 130 
Mann-Whitney-Wilcoxon rank test: 
asymptotic distnbution m general, 174 
confidence interval, 139 
consistency, 159 
data example, 140 
Edgeworth approximation, 139 
efficacy, 163 

Hodges-Lehmann estimator, 139 
locally most powerful rank test, 145-146 
mean and vanance in general, 158 
m multivanate model, 289-291 
projection, 138 

properties under null hypothesis, 134-137 
in restrictedly randomized design, 170 
true level when population shapes differ, 

174 

hue level in serial correlation model, 140 
Maihisen’s median test, 172-173 
"•dependence, 305 
Mean: 

functional, 20 
influence curve, 21-22 
tolerance, 18 
Median, 2 

asymptotic distribution, 23, 77 
functional, 20 
influence curve, 22 
in multivariate model, 293 
sensitivity curve, 21 
tolerance, 18 

test (two sample model): 

Mathisen's test, 172-173 
Mood’s test, 146, 150 


Median of Walsh averages, 39 
asymptotic distribution, 46, 77, 83 
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in multivariate model, 294 
tolerance, 43-44 
Midranks, 140 
Minitab commands: 

Maim-Whitney-Wilcoxon methods, 140 
in regression, linear model, 264 
Wilcoxon signed rank methods, 39 
Modified sign test, 93 
Mood’s median test, 146, 150 
confidence interval, 154 
distribution under the null hypothesis, 151 
Hodges-lLehmaim estimator, 153 
in multivariate model, 295 
relationship to sign statistic, 155, 167 
Multiple comparisons: 
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one sample model: 

asymptotic efficiency relative to t test, 110 
asymptotic efficiency relative to Wilcoxon 
signed rank test, 115 
efficacy, 110 
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estimator, 103 
score function, 91 
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two sample model, score function, 145, 150 

One-step R-estimator, 243 
asymptotic distribution, 277 
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one-way layout, 191 
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Order statistic, 172 

Page’s test, 199 

and Spearman’s rank correlation, 209 
Paired data design, 30, 168 
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Pitman regularity conditions, 64-65 
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projection 220 
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Stochastic ordering 139 

Strongly unimodal density 109 II4 117 

Sums of powers of integen 308 

Stylized sensitivity curve 20 

Symroetry 

diagonal ui multivanate model 282 
distnbution of d fferences 123 
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sign test, 17 

Wilcoxon signed rank test, 41 
Tolerance: 

estimation, 18, 41-42 
testmg, 18 

Translation statistic, 16 
Trimmed mean: 
asymptotic distribution, 27 
influence curve, 27 
tolerance, 18 
t statistic: 
efficacy, 70 

graphical representation, 13 

tme level in serial correlation model, 25 

Welch’s, 159, 162 

Unbiased test, 12, 26 

Uniformly most powerful test, see Sign test 

Walsh averages, 38 
estimates of location, 41 
median of, 39 

relationship to general signed rank statistics, 
94 

relationship to Wilcoxon signed rank 
statistic, 38 

Welch’s t test, 159, 162 
Wilcoxon signed rank test, 31-38 
asymptotic distribution, 37, 56, 125 
asymptotic linearity, 80 
asymptotic local power, 57, 61, 83-84 


in completely randomized design, 170 
confidence interval, 39, 82 
consistency, 49 

counting form, Walsh averages, 38 
data example, 40 
Edgeworth approximation, 37 
efficacy, 70 

efficiency relative to normal scores test, 115 
efficiency relative to t test, 72 
for grouped data, mixed tests, 131 
Hodges-Lehmann estimator, 39 
in multivariate model, 283-285 
projection, 54 

recurrence equation for the null distribution, 
171 

summary of properties, 84 
ties and zeros, 41 
tolerance, 124 

true level in serial correlation model, 25, 86 
Window width, 244 

Winsorized Wilcoxon signed rank test, 91 
confidence interval, 96 
efficacy, 107 

efficiency relative to Wilcoxon signed rank 
test, 107 

Hodges-Lehmann estimator, 95 
influence curve of Hodges-Lehmann 
estimator, 102 
in linear model, 264-265 
maximin test, 122 
tolerance, 92, 98 



